[ https://issues.apache.org/jira/browse/SPARK-38010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
gaoyajun02 updated SPARK-38010: ------------------------------- Parent: SPARK-33235 Issue Type: Sub-task (was: Improvement) > Push-based shuffle disabled due to insufficient mergeLocations > -------------------------------------------------------------- > > Key: SPARK-38010 > URL: https://issues.apache.org/jira/browse/SPARK-38010 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core > Affects Versions: 3.1.0 > Reporter: gaoyajun02 > Priority: Major > > The current shuffle merger locations is obtained based on the host of the > active or dead Executors. > When dynamic executor allocation is enabled, when an application submits the > first few stages, there are often not enough locations to satisfy the push > merge, which causes most shuffles to not benefit from the push bashed shuffle. > The first few shuffle write stages of spark applications are generally the > stages for reading tables or data sources, which account for a large amount > of shuffled data. Because push merge shuffle is disabled, the end-to-end > improvement of spark applications is very limited. > I probably thought of a way, but not sure if it's possible: > * Lazy initialize shuffle merger locations, After the mapper writes the > local shuffle data, it obtains the merge location in the push thread. > Looking for advice and solutions on this issue -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org