Wan Kun created SPARK-44109: ------------------------------- Summary: Remove duplicate preferred locations of each RDD partition Key: SPARK-44109 URL: https://issues.apache.org/jira/browse/SPARK-44109 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Wan Kun
DAGScheduler will get the preferred locations for each RDD partition and try to allocate the task on the preferred locations. We can remove the duplicate preferred locations to save memory. For example. reduce 0 needs to fetch map0 output and map1 output in host-A, then the preferred locations can be Array("host-A"). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org