Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22732 There is an argument about whether #22259 introduced behavior changes. Here is my analysis. Before #22259 , the type and null check was done as 1. user registers UDFs, like `(a: Int) => xxx`, `(b: Any) => xxx` 2. at compile time, get the type info, and set input types, so that the analyzer can add cast or fail the query if the real input data type doesn't match. Note that, UDFs like `(b: Any) => xxx` has no type info and we won't do type check. 3. at runtime, use reflection to get type info again, and add null check if an input is primitive type. After #22259 , the type and null check is done as 1. user registers UDFs, like `(a: Int) => xxx`, `(b: Any) => xxx` 2. at compile time, get the type info, set input types and input nullable, so that the analyzer can add cast or fail the query if the real input data type doesn't match, and add null check if necessary. Note that, UDFs like `(b: Any) => xxx` has no type info and we won't do type and null check. So we may have a behavior change if users register UDFs in a weird way. e.g. they define `(a: Int) => xxx`, but cast it to `Any => xx` during registration. Then we can't get the real type info at compile time. I'd say this is an invalid use case, because the data type check is also lost. I think we don't need to treat it as a behavior change.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org