[ https://issues.apache.org/jira/browse/SPARK-33700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned SPARK-33700: ------------------------------------- Assignee: Yang Jie > Try to push down filters for parquet and orc should add extra > `filters.nonEmpty` condition > ------------------------------------------------------------------------------------------ > > Key: SPARK-33700 > URL: https://issues.apache.org/jira/browse/SPARK-33700 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Yang Jie > Assignee: Yang Jie > Priority: Minor > > > {code:java} > lazy val footerFileMetaData = > ParquetFileReader.readFooter(conf, filePath, > SKIP_ROW_GROUPS).getFileMetaData > // Try to push zdown filters when filter push-down is enabled. > val pushed = if (enableParquetFilterPushDown) { > val parquetSchema = footerFileMetaData.getSchema > val parquetFilters = new ParquetFilters(parquetSchema, pushDownDate, > pushDownTimestamp, > pushDownDecimal, pushDownStringStartWith, pushDownInFilterThreshold, > isCaseSensitive) > filters > // Collects all converted Parquet filter predicates. Notice that not all > predicates can be > // converted (`ParquetFilters.createFilter` returns an `Option`). That's > why a `flatMap` > // is used here. > .flatMap(parquetFilters.createFilter) > .reduceOption(FilterApi.and) > } else { > None > } > {code} > > > Should add extra condition `filters.nonEmpty` when try to push down filters > for parquet to avoid unnecessary file reading (parquet footer), ORC has > similar problems. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org