Wan Kun created SPARK-44109:
-------------------------------

             Summary: Remove duplicate preferred locations of each RDD partition
                 Key: SPARK-44109
                 URL: https://issues.apache.org/jira/browse/SPARK-44109
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Wan Kun


DAGScheduler will get the preferred locations for each RDD partition and try to 
allocate the task on the preferred locations.

We can remove the duplicate preferred locations to save memory.

For example. reduce 0 needs to fetch map0 output and map1 output in host-A, 
then the preferred locations can be Array("host-A").



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to