[jira] [Commented] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side
[ https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15481044#comment-15481044 ] Apache Spark commented on SPARK-17310: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/15049 > Disable Parquet's record-by-record filter in normal parquet reader and do it > in Spark-side > -- > > Key: SPARK-17310 > URL: https://issues.apache.org/jira/browse/SPARK-17310 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon > > Currently, we are pushing filters down for normal Parquet reader which also > filters record-by-record. > It seems Spark-side codegen row-by-row filtering might be faster than > Parquet's one in general due to type-boxing and virtual function calls which > Spark's one tries to avoid. > Maybe we should perform a benchmark and disable this. This ticket was from > https://github.com/apache/spark/pull/14671 > Please refer the discussion in the PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side
[ https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15479530#comment-15479530 ] Hyukjin Kwon commented on SPARK-17310: -- [~andrew_duffy] Thanks Andrew. I will work on this. > Disable Parquet's record-by-record filter in normal parquet reader and do it > in Spark-side > -- > > Key: SPARK-17310 > URL: https://issues.apache.org/jira/browse/SPARK-17310 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon > > Currently, we are pushing filters down for normal Parquet reader which also > filters record-by-record. > It seems Spark-side codegen row-by-row filtering might be faster than > Parquet's one in general due to type-boxing and virtual function calls which > Spark's one tries to avoid. > Maybe we should perform a benchmark and disable this. This ticket was from > https://github.com/apache/spark/pull/14671 > Please refer the discussion in the PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17310) Disable Parquet's record-by-record filter in normal parquet reader and do it in Spark-side
[ https://issues.apache.org/jira/browse/SPARK-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15448672#comment-15448672 ] Andrew Duffy commented on SPARK-17310: -- +1 to this, see comments on https://github.com/apache/spark/pull/14671, particularly rdblue's comment. We need to wait for next release of Parquet to be able to be able to set {{parquet.filter.record-level.enabled}} config > Disable Parquet's record-by-record filter in normal parquet reader and do it > in Spark-side > -- > > Key: SPARK-17310 > URL: https://issues.apache.org/jira/browse/SPARK-17310 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hyukjin Kwon > > Currently, we are pushing filters down for normal Parquet reader which also > filters record-by-record. > It seems Spark-side codegen row-by-row filtering might be faster than > Parquet's one in general due to type-boxing and virtual function calls which > Spark's one tries to avoid. > Maybe we should perform a benchmark and disable this. This ticket was from > https://github.com/apache/spark/pull/14671 > Please refer the discussion in the PR. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org