[ https://issues.apache.org/jira/browse/SPARK-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214949#comment-14214949 ]
Apache Spark commented on SPARK-4453: ------------------------------------- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/3317 > Simplify Parquet record filter generation > ----------------------------------------- > > Key: SPARK-4453 > URL: https://issues.apache.org/jira/browse/SPARK-4453 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 1.0.0, 1.1.0 > Reporter: Cheng Lian > > Current Parquet record filter code uses {{CatalystFilter}} and its > sub-classes to represent all Spark SQL Parquet filter. Essentially, these > classes combines the original Catalyst predicate expression together with the > generated Parquet filter. {{ParquetFilters.findExpression}} then uses these > classes to filter out all expressions that can be pushed down. > However, this {{findExpression}} function is not necessary at the first > place, since we already know whether a predicate can be pushed down or not > while trying to generate its corresponding filter. > With this in mind, the code size of Parquet record filter generation can be > reduced significantly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org