[ https://issues.apache.org/jira/browse/SPARK-37357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XiDuo You updated SPARK-37357: ------------------------------ Description: For example `Rebalance` provide a functionality that split the large reduce partition into smalls. However we have seen many SQL produce small files due to the last partition. Let's say we have one reduce partition and six map partitions and the blocks are: [10, 10, 10, 10, 10, 10] If the target size is 50, we will get two files with 50 and 10. And it will get worse if there are thousands of reduce partitions. It should be helpful if we can control the min partition size. was: `Rebalance` provide a functionality that split the large reduce partition into smalls. However we have seen many SQL produce small files due to the last partition. Let's say we have one reduce partition and six map partitions and the blocks are: [10, 10, 10, 10, 10, 10]. If the target size is 50, we will get two files with 50 and 10. And it will get worse if there are thousands of reduce partitions. It should be helpful if we can merge the last small partition into previous. > Create skew partition specs should respect min partition size > ------------------------------------------------------------- > > Key: SPARK-37357 > URL: https://issues.apache.org/jira/browse/SPARK-37357 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: XiDuo You > Priority: Major > > For example `Rebalance` provide a functionality that split the large reduce > partition into smalls. However we have seen many SQL produce small files due > to the last partition. > Let's say we have one reduce partition and six map partitions and the blocks > are: > [10, 10, 10, 10, 10, 10] > If the target size is 50, we will get two files with 50 and 10. And it will > get worse if there are thousands of reduce partitions. > It should be helpful if we can control the min partition size. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org