guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing URL: https://github.com/apache/spark/pull/27157#issuecomment-573543733 @gengliangwang by `"data skipping uniformly for all file based data sources"` I mean that the above approach works uniformly for all formats whether they support pushdown or not. (It has also benefits for formats which support pushdown such as parquet by avoiding the need to read the footer of each file). See for example this [Spark Summit talk](https://databricks.com/session/using-pluggable-apache-spark-sql-filters-to-help-gridpocket-users-keep-up-with-the-jones-and-save-the-planet). Note that in datasource v1 the `dataFilters` are also passed to the `listFiles` method in the [`FileSourceScanExec`](https://github.com/apache/spark/blob/eefcc7d762a627bf19cab7041a1a82f88862e7e1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L210) case class which is used by all of the file based datasources.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org