[GitHub] [spark] guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing

GitBox Sun, 12 Jan 2020 23:49:36 -0800

guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push 
data filters for file listing
URL: https://github.com/apache/spark/pull/27157#issuecomment-573543733
 
 
   @gengliangwang by `"data skipping uniformly for all file based data 
sources"` I mean that the above approach works uniformly for all formats 
whether they support pushdown or not. 
   (It has also benefits for formats which support pushdown such as parquet by 
avoiding the need to read the footer of each file).
   See for example this [Spark Summit 
talk](https://databricks.com/session/using-pluggable-apache-spark-sql-filters-to-help-gridpocket-users-keep-up-with-the-jones-and-save-the-planet).
   
   Note that in datasource v1 the `dataFilters` are also passed to the 
`listFiles` method in the 
[`FileSourceScanExec`](https://github.com/apache/spark/blob/eefcc7d762a627bf19cab7041a1a82f88862e7e1/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L210)
 case class which is used by all of the file based datasources.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] guykhazma commented on issue #27157: [SPARK-30475][SQL] File source V2: Push data filters for file listing

Reply via email to