[ https://issues.apache.org/jira/browse/SPARK-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout resolved SPARK-13931. ------------------------------------ Resolution: Fixed Fix Version/s: 2.2.0 > Resolve stage hanging up problem in a particular case > ----------------------------------------------------- > > Key: SPARK-13931 > URL: https://issues.apache.org/jira/browse/SPARK-13931 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 1.4.1, 1.5.2, 1.6.0, 1.6.1 > Reporter: ZhengYaofeng > Fix For: 2.2.0 > > > Suppose the following steps: > 1. Open speculation switch in the application. > 2. Run this app and suppose last task of shuffleMapStage 1 finishes. Let's > get the record straight, from the eyes of DAG, this stage really finishes, > and from the eyes of TaskSetManager, variable 'isZombie' is set to true, but > variable runningTasksSet isn't empty because of speculation. > 3. Suddenly, executor 3 is lost. TaskScheduler receiving this signal, invokes > all executorLost functions of rootPool's taskSetManagers. DAG receiving this > signal, removes all this executor's outputLocs. > 4. TaskSetManager adds all this executor's tasks to pendingTasks and tells > DAG they will be resubmitted (Attention: possibly not on time). > 5. DAG starts to submit a new waitingStage, let's say shuffleMapStage 2, and > going to find that shuffleMapStage 1 is its missing parent because some > outputLocs are removed due to executor lost. Then DAG submits shuffleMapStage > 1 again. > 6. DAG still receives Task 'Resubmitted' signal from old taskSetManager, and > increases the number of pendingTasks of shuffleMapStage 1 each time. However, > old taskSetManager won't resolve new task to submit because its variable > 'isZombie' is set to true. > 7. Finally shuffleMapStage 1 never finishes in DAG together with all stages > depending on it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org