[ 
https://issues.apache.org/jira/browse/SPARK-19502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout closed SPARK-19502.
----------------------------------
    Resolution: Not A Problem

This code actually is currently needed to handle cases where a ShuffleMapTask 
succeeds on an executor, but that executor was marked as failed (so the task 
needs to be re-run), as described in this comment: 
https://github.com/apache/spark/pull/16620#issuecomment-279125227

> Remove unnecessary code to re-submit stages in the DAGScheduler
> ---------------------------------------------------------------
>
>                 Key: SPARK-19502
>                 URL: https://issues.apache.org/jira/browse/SPARK-19502
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.1.1
>            Reporter: Kay Ousterhout
>            Assignee: Kay Ousterhout
>            Priority: Minor
>
> There are a [few lines of code in the 
> DAGScheduler](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1215)
>  to re-submit shuffle map stages when some of the tasks fail.  My 
> understanding is that there should be a 1:1 mapping between pending tasks 
> (which are tasks that haven't completed successfully) and available output 
> locations, so that code should never be reachable.  Furthermore, the approach 
> taken by that code (to re-submit an entire stage as a result of task 
> failures) is not how we handle task failures in a stage (the lower-level 
> scheduler resubmits the individual tasks) which is what the 5-years-old TODO 
> on that code seems to be implying should be done.
> The big caveat is that there's a bug being fixed in SPARK-19263 that means 
> there is *not* a 1:1 relationship between pendingTasks and available 
> outputLocations, so that code is serving as a (buggy) band-aid.  This should 
> be fixed once we resolve SPARK-19263.
> cc [~imranr] [~markhamstra] [~jinxing6...@126.com] (let me know if any of you 
> see any reason we actually do need that code)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to