[ 
https://issues.apache.org/jira/browse/SPARK-23249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glen Takahashi updated SPARK-23249:
-----------------------------------
    Description: 
Change DataSourceScanExec so that when grouping blocks together into 
partitions, also checks the end of the sorted list of splits to more 
efficiently fill out partitions.

 
h2. Rationale

The current bin-packing method of next-fit descending for blocks into 
partitions is sub-optimal in a lot of cases and will result in extra 
partitions, un-even distribution of block-counts across partitions, and un-even 
distribution of partition sizes.

As an example, 128 files ranging from 1MB, 2MB,...127MB,128MB. will result in 
82 partitions with the current algorithm, but only 64 using this algorithm. 
Also in this example, the max # of blocks per partition in NFD is 13, while in 
this algorithm is is 2.

More generally, running a simulation of 1000 runs using 128MB blocksize, 
between 1-1000 normally distributed file sizes between 1-500Mb, you can see an 
improvement of approx 5% reduction of partition counts, and a large reduction 
in standard deviation of blocks per partition.

This algorithm also runs in O(n) time as NFD does, and in every case is 
strictly better results than NFD.

Overall, the more even distribution of blocks across partitions and therefore 
reduced partition counts should result in a small but significant performance 
increase across the board

  was:
Change DataSourceScanExec so that when grouping blocks together into 
partitions, also checks the end of the sorted list of splits to more 
efficiently fill out partitions.

 
h2. Rationale

The current bin-packing method of next-fit descending for blocks into 
partitions is sub-optimal in a lot of cases and will result in extra 
partitions, un-even distribution of block-counts across partitions, and un-even 
distribution of partition sizes.

As an example, 128 files ranging from 1MB, 2MB,...127MB,128MB. will result in 
82 partitions with the current algorithm, but only 64 using this algorithm. 
Also in this example, the max # of blocks per partition in NFD is 13, while in 
this algorithm is is 2.

More generally, running a simulation of 1000 runs using 128MB blocksize, 
between 1-1000 normally distributed file sizes between 1-500Mb, you can see an 
improvement of approx 5% reduction of partition counts, and a large reduction 
in standard deviation of blocks per partition.

This algorithm also runs in O(n) time as NFD does, and in every case is 
strictly better results than NFD.

Overall, the more even distribution of blocks across partitions and therefore 
reduced partition counts should result in a small but significant performance 
increase across the board


> Improve partition bin-filling algorithm to have less skew and fewer partitions
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-23249
>                 URL: https://issues.apache.org/jira/browse/SPARK-23249
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.1
>            Reporter: Glen Takahashi
>            Priority: Major
>
> Change DataSourceScanExec so that when grouping blocks together into 
> partitions, also checks the end of the sorted list of splits to more 
> efficiently fill out partitions.
>  
> h2. Rationale
> The current bin-packing method of next-fit descending for blocks into 
> partitions is sub-optimal in a lot of cases and will result in extra 
> partitions, un-even distribution of block-counts across partitions, and 
> un-even distribution of partition sizes.
> As an example, 128 files ranging from 1MB, 2MB,...127MB,128MB. will result in 
> 82 partitions with the current algorithm, but only 64 using this algorithm. 
> Also in this example, the max # of blocks per partition in NFD is 13, while 
> in this algorithm is is 2.
> More generally, running a simulation of 1000 runs using 128MB blocksize, 
> between 1-1000 normally distributed file sizes between 1-500Mb, you can see 
> an improvement of approx 5% reduction of partition counts, and a large 
> reduction in standard deviation of blocks per partition.
> This algorithm also runs in O(n) time as NFD does, and in every case is 
> strictly better results than NFD.
> Overall, the more even distribution of blocks across partitions and therefore 
> reduced partition counts should result in a small but significant performance 
> increase across the board



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to