[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020141#comment-16020141 ]
Josh Rosen commented on SPARK-20178: ------------------------------------ Sure, let me clarify: * When a FetchFailure occurs, the DAGScheduler receives a fetch failure message of the form {{FetchFailed(bmAddress, shuffleId, mapId, reduceId, failureMessage)}}. * As of today's Spark master branch, the DAGScheduler handles this failure by marking that individual output as unavailable ( and by marking all outputs on that executor as unavailable. ** See https://github.com/apache/spark/blob/9b09101938399a3490c3c9bde9e5f07031140fdf/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1339 and https://github.com/apache/spark/blob/9b09101938399a3490c3c9bde9e5f07031140fdf/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1346. ** As a shorthand, let's call this {{remove(shuffleId, mapId)}} followed by {{remove(blockManagerId)}}. > Improve Scheduler fetch failures > -------------------------------- > > Key: SPARK-20178 > URL: https://issues.apache.org/jira/browse/SPARK-20178 > Project: Spark > Issue Type: Epic > Components: Scheduler > Affects Versions: 2.1.0 > Reporter: Thomas Graves > > We have been having a lot of discussions around improving the handling of > fetch failures. There are 4 jira currently related to this. > We should try to get a list of things we want to improve and come up with one > cohesive design. > SPARK-20163, SPARK-20091, SPARK-14649 , and SPARK-19753 > I will put my initial thoughts in a follow on comment. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org