Hey,

Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am 
wondering if it makes sense to introduce a threshold when this rewrite happens 
until [1] is resolved. We can have something similar to 
“spark.sql.parquet.pushdown.inFilterThreshold” in Spark.

We have experienced a performance degradation on a few queries. One of the 
queries had 5 predicates and 2 of them were IN. In this specific case, IN 
predicates didn’t help to filter out files and just made the overall row filter 
more complicated.

Thanks,
Anton


[1] - https://github.com/apache/incubator-iceberg/issues/39 
<https://github.com/apache/incubator-iceberg/issues/39>

Reply via email to