[ https://issues.apache.org/jira/browse/SPARK-19560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kay Ousterhout closed SPARK-19560. ---------------------------------- Resolution: Fixed Target Version/s: 2.2.0 > Improve tests for when DAGScheduler learns of "successful" ShuffleMapTask > from a failed executor > ------------------------------------------------------------------------------------------------ > > Key: SPARK-19560 > URL: https://issues.apache.org/jira/browse/SPARK-19560 > Project: Spark > Issue Type: Test > Components: Scheduler > Affects Versions: 2.1.1 > Reporter: Kay Ousterhout > Assignee: Kay Ousterhout > Priority: Minor > > There's some tricky code around the case when the DAGScheduler learns of a > ShuffleMapTask that completed successfully, but ran on an executor that > failed sometime after the task was launched. This case is tricky because the > TaskSetManager (i.e., the lower level scheduler) thinks the task completed > successfully, but the DAGScheduler considers the output it generated to be no > longer valid (because it was probably lost when the executor was lost). As a > result, the DAGScheduler needs to re-submit the stage, so that the task can > be re-run. This is tested in some of the tests but not clearly documented, > so we should improve this to prevent future bugs (this was encountered by > [~markhamstra] in attempting to find a better fix for SPARK-19263). -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org