[ 
https://issues.apache.org/jira/browse/SPARK-33574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439603#comment-17439603
 ] 

Apache Spark commented on SPARK-33574:
--------------------------------------

User 'rmcyang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34500

> Improve locality for push-based shuffle especially for join like operations
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-33574
>                 URL: https://issues.apache.org/jira/browse/SPARK-33574
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Min Shen
>            Priority: Major
>
> Currently, we only set locality for ShuffledRDD and ShuffledRowRDD with 
> push-based shuffle.
> In simple stage DAGs where a ShuffledRDD or ShuffledRowRDD is the only input 
> RDD, Spark can handle locality fine. However, if we have a join operation 
> where a stage can consume multiple shuffle inputs or other non-shuffle 
> inputs, the locality will take a hit with how DAGScheduler currently 
> determines the preferred location.
> With push-based shuffle, we could potentially reuse the same set of merger 
> locations across sibling ShuffleMapStages. This would enable a much better 
> locality on the reducer stage side, where corresponding merged shuffle 
> partitions for the multiple shuffle inputs are already colocated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to