Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 To sum up, I would set the goal of this PR is to make `In` expressions as efficient as possible for bytes/shorts/ints. Then we can do benchmarks for `In` vs `InSet` in [SPARK-26203](https://issues.apache.org/jira/browse/SPARK-26203) and try to come up with a solution for `InSet` in [SPARK-26204](https://issues.apache.org/jira/browse/SPARK-26204). By the time we solve [SPARK-26204](https://issues.apache.org/jira/browse/SPARK-26204), we will have a clear undestanding of pros and cons in `In` and `InSet` and would be able to determine the right thresholds.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org