Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/8983#issuecomment-146047895
  
    @marmbrus
    
    Thanks a lot michael for looking into this. I debugged hive to understand 
the 
    behaviour and would like to share my findings. I wanted to make sure we are 
doing 
    the right thing here. Here are the comments at the top of GenericUDFIn() in 
hive.
    
    /**
     * GenericUDFIn
     *
     * Example usage:
     * SELECT key FROM src WHERE key IN ("238", "1");
     *
     * From MySQL page on IN(): To comply with the SQL standard, IN returns NULL
     * not only if the expression on the left hand side is NULL, but also if no
     * match is found in the list and one of the expressions in the list is 
NULL.
     *
     * Also noteworthy: type conversion behavior is different from MySQL. With
     * expr IN expr1, expr2... in MySQL, exprN will each be converted into the 
same
     * type as expr. In the Hive implementation, all expr(N) will be converted 
into
     * a common type for conversion consistency with other UDF's, and to prevent
     * conversions from a big type to a small type (e.g. int to tinyint)
     */
    
    **(case 1) expr in (expr1,... exprN)**
    * **1.1** Per sql standard if expr is NULL then IN should return NULL
            (with this PR we are attempting to achieve this)
    * **1.2** if any of the expression in the right hand side is NULL and also
            no match is found in the list then IN should also return NULL.
            We also enforce this semantics by 
            
[implementation](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L124-L144)
    
    **(case 2) Type conversion semantics.**
    * **2.1** In MySQL all the expressions in the right hand side is converted 
            to left hand side type. Our behaviour matches this semantics. I am
            not sure if this is the standard though.
    * **2.2** In Hive, they seem to find a common type (probably larger type) 
and
            promote both left hand side and right hand side to that common type.
            I believe this is where it throws the SemanticException.
    
    Our behaviour seems match that of MySql more at the present time. Do we 
want to change this ? 
    also about case 1 , it is not clear from the hive comments on what they 
intended to
    do vs what is the external behaviour. Please let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to