Kay Ousterhout created SPARK-19560:
--------------------------------------

             Summary: Improve tests for when DAGScheduler learns of 
"successful" ShuffleMapTask from a failed executor
                 Key: SPARK-19560
                 URL: https://issues.apache.org/jira/browse/SPARK-19560
             Project: Spark
          Issue Type: Test
          Components: Scheduler
    Affects Versions: 2.1.1
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout
            Priority: Minor


There's some tricky code around the case when the DAGScheduler learns of a 
ShuffleMapTask that completed successfully, but ran on an executor that failed 
sometime after the task was launched.  This case is tricky because the 
TaskSetManager (i.e., the lower level scheduler) thinks the task completed 
successfully, but the DAGScheduler considers the output it generated to be no 
longer valid (because it was probably lost when the executor was lost).  As a 
result, the DAGScheduler needs to re-submit the stage, so that the task can be 
re-run.  This is tested in some of the tests but not clearly documented, so we 
should improve this to prevent future bugs (this was encountered by 
[~markhamstra] in attempting to find a better fix for SPARK-19263).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to