[ https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15481044#comment-15481044 ]
Apache Spark commented on SPARK-17310: -------------------------------------- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/15049 > Disable Parquet's record-by-record filter in normal parquet reader and do it > in Spark-side > ------------------------------------------------------------------------------------------ > > Key: SPARK-17310 > URL: https://issues.apache.org/jira/browse/SPARK-17310 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > > Currently, we are pushing filters down for normal Parquet reader which also > filters record-by-record. > It seems Spark-side codegen row-by-row filtering might be faster than > Parquet's one in general due to type-boxing and virtual function calls which > Spark's one tries to avoid. > Maybe we should perform a benchmark and disable this. This ticket was from > https://github.com/apache/spark/pull/14671 > Please refer the discussion in the PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org