Yang Jie created SPARK-33700:
--------------------------------

             Summary: Try to push down filters for parquet and orc should add 
add `filters.nonEmpty` condition
                 Key: SPARK-33700
                 URL: https://issues.apache.org/jira/browse/SPARK-33700
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Yang Jie


 
{code:java}
lazy val footerFileMetaData =
  ParquetFileReader.readFooter(conf, filePath, SKIP_ROW_GROUPS).getFileMetaData
// Try to push zdown filters when filter push-down is enabled.
val pushed = if (enableParquetFilterPushDown) {
  val parquetSchema = footerFileMetaData.getSchema
  val parquetFilters = new ParquetFilters(parquetSchema, pushDownDate, 
pushDownTimestamp,
    pushDownDecimal, pushDownStringStartWith, pushDownInFilterThreshold, 
isCaseSensitive)
  filters
    // Collects all converted Parquet filter predicates. Notice that not all 
predicates can be
    // converted (`ParquetFilters.createFilter` returns an `Option`). That's 
why a `flatMap`
    // is used here.
    .flatMap(parquetFilters.createFilter)
    .reduceOption(FilterApi.and)
} else {
  None
}
{code}
 

 

Should add extra condition `filters.nonEmpty` when try to push down filters for 
parquet to avoid unnecessary file reading (parquet footer)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to