Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22313 Thank you for review, @kiszk . First, I don't want to hold the memory up after query completion. If we do, it will be a regression. So, I wanted `time` first. Second, It's difficult to estimate the enough limit for the number of filters. - As we know codegen JVM limit issue. There are several attempts to generate a single complex query for wide tables (thousands of columns). - Spark's optimizer like `InferFiltersFromConstraints` adds more constraints like 'NotNull(col1)`. Usually, the number of filters becomes double here. - Also, it's not a good design if we need to increase this limitation whenever we add a new optimizer like `InferFiltersFromConstraints`. - If the limit is too high, we waste the memory. If the limit is small, the eviction will bite us again. In short, `time` was enough and the simplest for this purpose.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org