GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/14241

    [SPARK-16596] [SQL] Refactor DataSourceScanExec to do partition discovery 
at execution instead of planning time

    ## What changes were proposed in this pull request?
    
    Partition discovery is rather expensive, so we should do it at execution 
time instead of during physical planning. Right now there is not much benefit 
since ListingFileCatalog will read scan for all partitions at planning time 
anyways, but this can be optimized in the future. Also, there might be more 
information for partition pruning not available at planning time.
    
    TODO: In another pr, move DataSourceScanExec to it's own file.
    
    ## How was this patch tested?
    
    Existing tests (it might be worth adding a test that catalog.listFiles() is 
delayed until execution, but this can be delayed until there is an actual 
benefit to doing so).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14241
    
----
commit d04636474cce217106c1b3bfb60b5da54a53f7e5
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-16T02:18:52Z

    Fri Jul 15 19:18:52 PDT 2016

commit 36d6ef44051a9ac57b9d6d1681aa9b11fa16d259
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-17T21:28:47Z

    Sun Jul 17 14:28:47 PDT 2016

commit 6c0eb0e05238e21c68b3e26c1efad01c2af3e5e8
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-17T21:29:46Z

    Sun Jul 17 14:29:46 PDT 2016

commit 1a4660286496663f6cb3414a22460e4fb24610b1
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-17T21:36:58Z

    Sun Jul 17 14:36:58 PDT 2016

commit 538233499efce05379110d7210a0cdc7e25b699e
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-17T21:42:32Z

    Sun Jul 17 14:42:32 PDT 2016

commit 98d6d74dde496b2256081e5840d56e91031e4db3
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-17T21:55:13Z

    Sun Jul 17 14:55:13 PDT 2016

commit 0d4642a3cef757666fbc72932d3eb78bbaeec530
Author: Eric Liang <e...@databricks.com>
Date:   2016-07-17T22:12:24Z

    Sun Jul 17 15:12:24 PDT 2016

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to