[ https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia updated SPARK-1937: --------------------------------- Assignee: Rui Li > Tasks can be submitted before executors are registered > ------------------------------------------------------ > > Key: SPARK-1937 > URL: https://issues.apache.org/jira/browse/SPARK-1937 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Rui Li > Assignee: Rui Li > Fix For: 1.1.0 > > Attachments: After-patch.PNG, Before-patch.png, RSBTest.scala > > > During construction, TaskSetManager will assign tasks to several pending > lists according to the tasks’ preferred locations. If the desired location is > unavailable, it’ll then assign this task to “pendingTasksWithNoPrefs”, a list > containing tasks without preferred locations. > The problem is that tasks may be submitted before the executors get > registered with the driver, in which case TaskSetManager will assign all the > tasks to pendingTasksWithNoPrefs. Later when it looks for a task to schedule, > it will pick one from this list and assign it to arbitrary executor, since > TaskSetManager considers the tasks can run equally well on any node. > This problem deprives benefits of data locality, drags the whole job slow and > can cause imbalance between executors. > I ran into this issue when running a spark program on a 7-node cluster > (node6~node12). The program processes 100GB data. > Since the data is uploaded to HDFS from node6, this node has a complete copy > of the data and as a result, node6 finishes tasks much faster, which in turn > makes it complete dis-proportionally more tasks than other nodes. > To solve this issue, I think we shouldn't check availability of > executors/hosts when constructing TaskSetManager. If a task prefers a node, > we simply add the task to that node’s pending list. When later on the node is > added, TaskSetManager can schedule the task according to proper locality > level. If unfortunately the preferred node(s) never gets added, > TaskSetManager can still schedule the task at locality level “ANY”. -- This message was sent by Atlassian JIRA (v6.2#6252)