[ 
https://issues.apache.org/jira/browse/SPARK-37357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-37357:
------------------------------
    Description: 
`Rebalance` provide a functionality that split the large reduce partition into 
smalls. However we have seen many SQL produce small files due to the last 
partition.

Let's say we have one reduce partition and six map partitions and the blocks 
are: [10, 10, 10, 10, 10, 10]. If the target size is 50, we will get two files 
with 50 and 10. And it will get worse if there are thousands of reduce 
partitions.

It should be helpful if we can merge the last small partition into previous.

  was:
`Rebalance` provide a functionality that split the large reduce partition into 
smalls. However we have seen many SQL produce small files due to the last 
partition.

Let's say we have one reduce partition and three map partitions and the blocks 
are: [10, 10, 10, 10, 10, 10]. If the target size is 50, we will get two files 
with 50 and 10. And it will get worse if there are thousands of reduce 
partitions.

It should be helpful if we can merge the last small partition into previous.


> Add merged last partition factor for rebalance
> ----------------------------------------------
>
>                 Key: SPARK-37357
>                 URL: https://issues.apache.org/jira/browse/SPARK-37357
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: XiDuo You
>            Priority: Major
>
> `Rebalance` provide a functionality that split the large reduce partition 
> into smalls. However we have seen many SQL produce small files due to the 
> last partition.
> Let's say we have one reduce partition and six map partitions and the blocks 
> are: [10, 10, 10, 10, 10, 10]. If the target size is 50, we will get two 
> files with 50 and 10. And it will get worse if there are thousands of reduce 
> partitions.
> It should be helpful if we can merge the last small partition into previous.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to