[ https://issues.apache.org/jira/browse/SPARK-29002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343137#comment-17343137 ]
Penglei Shi commented on SPARK-29002: ------------------------------------- [~maryannxue] Hi Wei Xue, as the issue described, the rule depends on the ratio of empty partitions. But in my scenario, i set initial partition num as 1000, after a shuffle exchange, there are 1000 small size partition but most of those are not empty. When changing smj to bhj, there will be 1000 small tasks, which can not be coalesced and produce massive small file, to many small tasks also take more time to schedule. Will there have a better way to cover the mentioned scenario? > Avoid changing SMJ to BHJ if the build side has a high ratio of empty > partitions > -------------------------------------------------------------------------------- > > Key: SPARK-29002 > URL: https://issues.apache.org/jira/browse/SPARK-29002 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.0.0 > Reporter: Wei Xue > Assignee: Wei Xue > Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org