Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22029 Thanks for you comment @HyukjinKwon. Yes sure, I'll update the PR description, thanks. Yes, the previous behavior is the same as Hive behavior. What I wanted to highlight there, though, is that every system (Hive included somehow) has the same behavior when IN is used, regardless whether IN is followed by a query or a sequence of literals. This is, IMHO, extremely important. As a end user, if I have a query which involves a IN with a complex subquery and I am testing/debugging it, I can put the output I expect from it as literals in order to check where is my problem (ie. in the subquery or not), so it is critical that the behavior in these 2 cases is the same, otherwise I - as a end user - would be very confused about what is going on. Basically, I mean that running: ``` select a, b, c from t1 where (a, b, c) in (select 1, null, "abc"); ``` and ``` select a, b, c from t1 where (a, b, c) in ((1, null, "abc")); ``` should return the same result (which is not true before the PR in Spark, while for all other systems is). Please notice that this PR focuses only in the case when the literal is a struct of more than one fields. In the case of Hive, it has the same behavior as Spark before this PR, but Hive doesn't allow having subqueries which return more than one field. So, in the example above, the first query would throw an analysis exception, which is a limitation in Hive, but at least doesn't confuse the user with different behaviors. Hope this answers your question. Please let me know if additional details are required. Thanks.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org