[ 
https://issues.apache.org/jira/browse/SPARK-29002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343137#comment-17343137
 ] 

Penglei Shi commented on SPARK-29002:
-------------------------------------

[~maryannxue] Hi Wei Xue, as the issue described, the rule depends on the ratio 
of empty partitions. But in my scenario, i set initial partition num as 1000,  
after a shuffle exchange, there are 1000 small size partition but most of those 
are not empty. When changing smj to bhj, there will be 1000 small tasks, which 
can not be coalesced and produce massive small file, to many small tasks also 
take more time to schedule. Will there have a better way to cover the mentioned 
scenario?

> Avoid changing SMJ to BHJ if the build side has a high ratio of empty 
> partitions
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-29002
>                 URL: https://issues.apache.org/jira/browse/SPARK-29002
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wei Xue
>            Assignee: Wei Xue
>            Priority: Major
>             Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to