Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21403 @cloud-fan the problem is that the change is not only for the case when IN is followed by a listquery. The change is needed also in the other case. And the reason why this change is needed is to detect the difference between these 2 queries: 1. `select 1 from (select (1, 'a') as col1) tab1 where col1 in (select 1, 'a')` or equivalently `select 1 from (select (1, 'a') as col1) tab1 where col1 in ((1, 'a'))` 2. `select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) in (select 1, 'a')` or equivalently `select 1 from (select 1 as col1, 'a' as col2) tab1 where (col1, col2) in ((1, 'a'))` In particular, queries 1 are invalid as they are comparing one value column with 2 column in the inner query/list of constants; while queries 2 are valid as they are comparing 2 columns on both sides. I hope this clarifies that introducing a specific `InListQuery` couldn't solve the problem. > It's not public so we can change it, but I believe some advanced users use these internal classes and we should keep these classes unchanged as possible as we can. I agree with you on this point, that is why I initially changed my proposal from `Seq[Expression]` to introducing the new `InValues`expression. Though also this might break existing user code as there is an extra expression they wouln't expect. So I think both solutions are equivalent. The only thing we cn do about this point is wait for 3.0 to have this in if we consider this a breaking change.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org