Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/8983#issuecomment-146047895 @marmbrus Thanks a lot michael for looking into this. I debugged hive to understand the behaviour and would like to share my findings. I wanted to make sure we are doing the right thing here. Here are the comments at the top of GenericUDFIn() in hive. /** * GenericUDFIn * * Example usage: * SELECT key FROM src WHERE key IN ("238", "1"); * * From MySQL page on IN(): To comply with the SQL standard, IN returns NULL * not only if the expression on the left hand side is NULL, but also if no * match is found in the list and one of the expressions in the list is NULL. * * Also noteworthy: type conversion behavior is different from MySQL. With * expr IN expr1, expr2... in MySQL, exprN will each be converted into the same * type as expr. In the Hive implementation, all expr(N) will be converted into * a common type for conversion consistency with other UDF's, and to prevent * conversions from a big type to a small type (e.g. int to tinyint) */ **(case 1) expr in (expr1,... exprN)** * **1.1** Per sql standard if expr is NULL then IN should return NULL (with this PR we are attempting to achieve this) * **1.2** if any of the expression in the right hand side is NULL and also no match is found in the list then IN should also return NULL. We also enforce this semantics by [implementation](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L124-L144) **(case 2) Type conversion semantics.** * **2.1** In MySQL all the expressions in the right hand side is converted to left hand side type. Our behaviour matches this semantics. I am not sure if this is the standard though. * **2.2** In Hive, they seem to find a common type (probably larger type) and promote both left hand side and right hand side to that common type. I believe this is where it throws the SemanticException. Our behaviour seems match that of MySql more at the present time. Do we want to change this ? also about case 1 , it is not clear from the hive comments on what they intended to do vs what is the external behaviour. Please let me know what you think.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org