[ https://issues.apache.org/jira/browse/SPARK-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925125#comment-15925125 ]
Apache Spark commented on SPARK-14649: -------------------------------------- User 'sitalkedia' has created a pull request for this issue: https://github.com/apache/spark/pull/17297 > DagScheduler re-starts all running tasks on fetch failure > --------------------------------------------------------- > > Key: SPARK-14649 > URL: https://issues.apache.org/jira/browse/SPARK-14649 > Project: Spark > Issue Type: Bug > Components: Scheduler > Reporter: Sital Kedia > > When a fetch failure occurs, the DAGScheduler re-launches the previous stage > (to re-generate output that was missing), and then re-launches all tasks in > the stage with the fetch failure that hadn't *completed* when the fetch > failure occurred (the DAGScheduler re-lanches all of the tasks whose output > data is not available -- which is equivalent to the set of tasks that hadn't > yet completed). > The assumption when this code was originally written was that when a fetch > failure occurred, the output from at least one of the tasks in the previous > stage was no longer available, so all of the tasks in the current stage would > eventually fail due to not being able to access that output. This assumption > does not hold for some large-scale, long-running workloads. E.g., there's > one use case where a job has ~100k tasks that each run for about 1 hour, and > only the first 5-10 minutes are spent fetching data. Because of the large > number of tasks, it's very common to see a few tasks fail in the fetch phase, > and it's wasteful to re-run other tasks that had finished fetching data so > aren't affected by the fetch failure (and may be most of the way through > their hour-long execution). The DAGScheduler should not re-start these tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org