GitHub user rdblue opened a pull request:

    https://github.com/apache/spark/pull/20988

    [SPARK-23877][SQL]: Use filter predicates to prune partitions in 
metadata-only queries

    ## What changes were proposed in this pull request?
    
    This updates the OptimizeMetadataOnlyQuery rule to use filter expressions 
when listing partitions, if there are filter nodes in the logical plan. This 
avoids listing all partitions for large tables on the driver.
    
    This also fixes a minor bug where the partitions returned from fsRelation 
cannot be serialized without hitting a stack level too deep error. This is 
caused by serializing a stream to executors, where the stream is a recursive 
structure. If the stream is too long, the serialization stack reaches the 
maximum level of depth. The fix is to create a LocalRelation using an Array 
instead of the incoming Seq.
    
    ## How was this patch tested?
    
    Existing tests for metadata-only queries.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/spark 
SPARK-23877-metadata-only-push-filters

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20988
    
----
commit 552efaee05b64f9ed4d5496b3b1d11b57b985f85
Author: Ryan Blue <blue@...>
Date:   2018-03-14T21:50:11Z

    Support filter conditions in metadata-only queries.

commit 2345896288828aefe14ebcb370d374b348400cf4
Author: Ryan Blue <blue@...>
Date:   2018-03-14T22:43:56Z

    Ensure partition data is an Array.
    
    The LocalRelation created for partition data for metadata-only queries
    may be a stream produced by listing directories. If the stream is large,
    serializing the LocalRelation to executors results in a stack overflow
    because the stream is a recursive structure of (head, rest-of-stream).

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to