[ 
https://issues.apache.org/jira/browse/SPARK-33700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33700:
-------------------------------------

    Assignee: Yang Jie

> Try to push down filters for parquet and orc should add extra 
> `filters.nonEmpty` condition
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-33700
>                 URL: https://issues.apache.org/jira/browse/SPARK-33700
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Yang Jie
>            Assignee: Yang Jie
>            Priority: Minor
>
>  
> {code:java}
> lazy val footerFileMetaData =
>   ParquetFileReader.readFooter(conf, filePath, 
> SKIP_ROW_GROUPS).getFileMetaData
> // Try to push zdown filters when filter push-down is enabled.
> val pushed = if (enableParquetFilterPushDown) {
>   val parquetSchema = footerFileMetaData.getSchema
>   val parquetFilters = new ParquetFilters(parquetSchema, pushDownDate, 
> pushDownTimestamp,
>     pushDownDecimal, pushDownStringStartWith, pushDownInFilterThreshold, 
> isCaseSensitive)
>   filters
>     // Collects all converted Parquet filter predicates. Notice that not all 
> predicates can be
>     // converted (`ParquetFilters.createFilter` returns an `Option`). That's 
> why a `flatMap`
>     // is used here.
>     .flatMap(parquetFilters.createFilter)
>     .reduceOption(FilterApi.and)
> } else {
>   None
> }
> {code}
>  
>  
> Should add extra condition `filters.nonEmpty` when try to push down filters 
> for parquet to avoid unnecessary file reading (parquet footer), ORC has 
> similar problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to