Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22597 > I haven't looked into, but the parquet record-level filtering is disabled by default, so if we remove predicates from spark side, the result can be wrong even if the predicates are pushed ro parquet. That's explicitly enabled for the parquet tests (that's disabled by my FWIW). For ORC tests, since it doesn't support record by record filter, it tests if the output is smaller then the original data. Some parquet tests do this as well for instance, https://github.com/apache/spark/blob/5d726b865948f993911fd5b9730b25cfa94e16c7/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala#L1016-L1040
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org