Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22597
  
    > I haven't looked into, but the parquet record-level filtering is disabled 
by default, so if we remove predicates from spark side, the result can be wrong 
even if the predicates are pushed ro parquet.
    
    That's explicitly enabled for the parquet tests (that's disabled by my 
FWIW). For ORC tests, since it doesn't support record by record filter, it 
tests if the output is smaller then the original data.
    
    Some parquet tests do this as well for instance, 
    
    
https://github.com/apache/spark/blob/5d726b865948f993911fd5b9730b25cfa94e16c7/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala#L1016-L1040


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to