[ 
https://issues.apache.org/jira/browse/SPARK-19560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862248#comment-15862248
 ] 

Apache Spark commented on SPARK-19560:
--------------------------------------

User 'kayousterhout' has created a pull request for this issue:
https://github.com/apache/spark/pull/16892

> Improve tests for when DAGScheduler learns of "successful" ShuffleMapTask 
> from a failed executor
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19560
>                 URL: https://issues.apache.org/jira/browse/SPARK-19560
>             Project: Spark
>          Issue Type: Test
>          Components: Scheduler
>    Affects Versions: 2.1.1
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>            Priority: Minor
>
> There's some tricky code around the case when the DAGScheduler learns of a 
> ShuffleMapTask that completed successfully, but ran on an executor that 
> failed sometime after the task was launched.  This case is tricky because the 
> TaskSetManager (i.e., the lower level scheduler) thinks the task completed 
> successfully, but the DAGScheduler considers the output it generated to be no 
> longer valid (because it was probably lost when the executor was lost).  As a 
> result, the DAGScheduler needs to re-submit the stage, so that the task can 
> be re-run.  This is tested in some of the tests but not clearly documented, 
> so we should improve this to prevent future bugs (this was encountered by 
> [~markhamstra] in attempting to find a better fix for SPARK-19263).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to