GitHub user glentakahashi opened a pull request:

    https://github.com/apache/spark/pull/20372

    Improved block merging logic for partitions

    ## What changes were proposed in this pull request?
    
    Change DataSourceScanExec so that when grouping blocks together into 
partitions, also checks the end of the sorted list of splits to more 
efficiently fill out partitions.
    
    ## How was this patch tested?
    
    Updated old test to reflect the new logic, which causes the # of partitions 
to drop from 4 -> 3


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/glentakahashi/spark 
feature/improved-block-merging

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20372.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20372
    
----
commit c575977a5952bf50b605be8079c9be1e30f3bd36
Author: Glen Takahashi <gtakahashi@...>
Date:   2018-01-23T23:22:34Z

    Improved block merging logic for partitions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to