[ https://issues.apache.org/jira/browse/SPARK-37357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XiDuo You updated SPARK-37357: ------------------------------ Summary: Add merged last partition factor for split skew partition (was: Add merged last partition factor for rebalance) > Add merged last partition factor for split skew partition > --------------------------------------------------------- > > Key: SPARK-37357 > URL: https://issues.apache.org/jira/browse/SPARK-37357 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: XiDuo You > Priority: Major > > `Rebalance` provide a functionality that split the large reduce partition > into smalls. However we have seen many SQL produce small files due to the > last partition. > Let's say we have one reduce partition and three map partitions and the > blocks are: [10, 10, 10, 10, 10, 10] and the target size is 50. We will get > two files with 50 and 10. And it will get worse if there are thousands of > reduce partitions. > It should be helpful if we can merge the last small partition into previous. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org