[ https://issues.apache.org/jira/browse/SPARK-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551836#comment-14551836 ]
Saisai Shao commented on SPARK-4352: ------------------------------------ Hi Sandy, I retrieved back the old code which supports preferredNodeLocations in yarn, it takes task distribution into consideration by {{generateNodeToWeight}}, it addresses some questions you mentioned above, but I think it is hard to apply such mechanism in dynamic allocation. If we already have 3 containers, we request 1 more container with a list of new preferred locality, do we need to kill all the old containers to re-request containers based on the new preferred locality? If so, the overhead will be high; if not, the locality will not be optimal. So we could only try to compute the partial optimal allocation strategy, it is hard to maintain a global optimal strategy. Sorry for my immature consideration, I will rethink my design and improve it. > Incorporate locality preferences in dynamic allocation requests > --------------------------------------------------------------- > > Key: SPARK-4352 > URL: https://issues.apache.org/jira/browse/SPARK-4352 > Project: Spark > Issue Type: Improvement > Components: Spark Core, YARN > Affects Versions: 1.2.0 > Reporter: Sandy Ryza > Priority: Critical > > Currently, achieving data locality in Spark is difficult unless an > application takes resources on every node in the cluster. > preferredNodeLocalityData provides a sort of hacky workaround that has been > broken since 1.0. > With dynamic executor allocation, Spark requests executors in response to > demand from the application. When this occurs, it would be useful to look at > the pending tasks and communicate their location preferences to the cluster > resource manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org