Hey, Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am wondering if it makes sense to introduce a threshold when this rewrite happens until [1] is resolved. We can have something similar to “spark.sql.parquet.pushdown.inFilterThreshold” in Spark.
We have experienced a performance degradation on a few queries. One of the queries had 5 predicates and 2 of them were IN. In this specific case, IN predicates didn’t help to filter out files and just made the overall row filter more complicated. Thanks, Anton [1] - https://github.com/apache/incubator-iceberg/issues/39 <https://github.com/apache/incubator-iceberg/issues/39>