Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22732
  
    There is an argument about whether #22259 introduced behavior changes. Here 
is my analysis.
    
    Before #22259 , the type and null check was done as
    1. user registers UDFs, like `(a: Int) => xxx`, `(b: Any) => xxx`
    2. at compile time, get the type info, and set input types, so that the 
analyzer can add cast or fail the query if the real input data type doesn't 
match. Note that, UDFs like `(b: Any) => xxx` has no type info and we won't do 
type check.
    3. at runtime, use reflection to get type info again, and add null check if 
an input is primitive type.
    
    After #22259 , the type and null check is done as
    1. user registers UDFs, like `(a: Int) => xxx`, `(b: Any) => xxx`
    2. at compile time, get the type info, set input types and input nullable, 
so that the analyzer can add cast or fail the query if the real input data type 
doesn't match, and add null check if necessary. Note that, UDFs like `(b: Any) 
=> xxx` has no type info and we won't do type and null check.
    
    So we may have a behavior change if users register UDFs in a weird way. 
e.g. they define `(a: Int) => xxx`, but cast it to `Any => xx` during 
registration. Then we can't get the real type info at compile time.
    
    I'd say this is an invalid use case, because the data type check is also 
lost. I think we don't need to treat it as a behavior change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to