[ 
https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481614#comment-17481614
 ] 

gaoyajun02 commented on SPARK-38010:
------------------------------------

https://issues.apache.org/jira/browse/SPARK-34826 can solve it? [~vsowrirajan] 

> Push-based shuffle disabled due to insufficient mergeLocations
> --------------------------------------------------------------
>
>                 Key: SPARK-38010
>                 URL: https://issues.apache.org/jira/browse/SPARK-38010
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: gaoyajun02
>            Priority: Major
>
> The current shuffle merger locations is obtained based on the host of the 
> active or dead Executors.
> When dynamic executor allocation is enabled, when an application submits the 
> first few stages, there are often not enough locations to satisfy the push 
> merge, which causes most shuffles to not benefit from the push bashed shuffle.
> The first few shuffle write stages of spark applications are generally the 
> stages for reading tables or data sources, which account for a large amount 
> of shuffled data. Because push merge shuffle is disabled, the end-to-end 
> improvement of spark applications is very limited.
> I probably thought of a way, but not sure if it's possible:
>  *  Lazy initialize shuffle merger locations, After the mapper writes the 
> local shuffle data, it obtains the merge location in the push thread.
> Looking for advice and solutions on this issue



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to