Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22029
  
    Thanks for you comment @HyukjinKwon. Yes sure, I'll update the PR 
description, thanks.
    
    Yes, the previous behavior is the same as Hive behavior. What I wanted to 
highlight there, though, is that every system (Hive included somehow) has the 
same behavior when IN is used, regardless whether IN is followed by a query or 
a sequence of literals. This is, IMHO, extremely important. As a end user, if I 
have a query which involves a IN with a complex subquery and I am 
testing/debugging it, I can put the output I expect from it as literals in 
order to check where is my problem (ie. in the subquery or not), so it is 
critical that the behavior in these 2 cases is the same, otherwise I - as a end 
user - would be very confused about what is going on. Basically, I mean that 
running:
    ```
    select a, b, c from t1 where (a, b, c) in (select 1, null, "abc");
    ```
    and 
    ```
    select a, b, c from t1 where (a, b, c) in ((1, null, "abc"));
    ```
    should return the same result (which is not true before the PR in Spark, 
while for all other systems is).
    
    Please notice that this PR focuses only in the case when the literal is a 
struct of more than one fields.
    In the case of Hive, it has the same behavior as Spark before this PR, but 
Hive doesn't allow having subqueries which return more than one field. So, in 
the example above, the first query would throw an analysis exception, which is 
a limitation in Hive, but at least doesn't confuse the user with different 
behaviors.
    
    Hope this answers your question. Please let me know if additional details 
are required. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to