gaoyajun02 created SPARK-38010:
----------------------------------

             Summary: Push-based shuffle disabled due to insufficient 
mergeLocations
                 Key: SPARK-38010
                 URL: https://issues.apache.org/jira/browse/SPARK-38010
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle, Spark Core
    Affects Versions: 3.1.0
            Reporter: gaoyajun02


The current shuffle merger position is obtained based on the host of the active 
or dead Executor.
When dynamic resource allocation is enabled, when the application submits the 
first few stages, there are often not enough locations to satisfy the push 
merge, which causes most shuffles to not benefit from the push bashed shuffle.
The first few shuffle write stages of spark applications are generally the 
stages for reading tables or data sources, which account for a large amount of 
shuffled data and the proportion of data. Because push cannot be used, the 
end-to-end improvement of spark applications is very limited.

I probably thought of a way, but not sure if it's possible:
 *  Lazy initialize shuffle merger locations, After the mapper writes the local 
shuffle data, it obtains the merge location in the push thread.

Looking for advice and solutions on this issue



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to