[ 
https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated SPARK-1937:
---------------------------------

    Assignee: Rui Li

> Tasks can be submitted before executors are registered
> ------------------------------------------------------
>
>                 Key: SPARK-1937
>                 URL: https://issues.apache.org/jira/browse/SPARK-1937
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Rui Li
>            Assignee: Rui Li
>             Fix For: 1.1.0
>
>         Attachments: After-patch.PNG, Before-patch.png, RSBTest.scala
>
>
> During construction, TaskSetManager will assign tasks to several pending 
> lists according to the tasks’ preferred locations. If the desired location is 
> unavailable, it’ll then assign this task to “pendingTasksWithNoPrefs”, a list 
> containing tasks without preferred locations.
> The problem is that tasks may be submitted before the executors get 
> registered with the driver, in which case TaskSetManager will assign all the 
> tasks to pendingTasksWithNoPrefs. Later when it looks for a task to schedule, 
> it will pick one from this list and assign it to arbitrary executor, since 
> TaskSetManager considers the tasks can run equally well on any node.
> This problem deprives benefits of data locality, drags the whole job slow and 
> can cause imbalance between executors.
> I ran into this issue when running a spark program on a 7-node cluster 
> (node6~node12). The program processes 100GB data.
> Since the data is uploaded to HDFS from node6, this node has a complete copy 
> of the data and as a result, node6 finishes tasks much faster, which in turn 
> makes it complete dis-proportionally more tasks than other nodes.
> To solve this issue, I think we shouldn't check availability of 
> executors/hosts when constructing TaskSetManager. If a task prefers a node, 
> we simply add the task to that node’s pending list. When later on the node is 
> added, TaskSetManager can schedule the task according to proper locality 
> level. If unfortunately the preferred node(s) never gets added, 
> TaskSetManager can still schedule the task at locality level “ANY”.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to