[ 
https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020141#comment-16020141
 ] 

Josh Rosen commented on SPARK-20178:
------------------------------------

Sure, let me clarify:

* When a FetchFailure occurs, the DAGScheduler receives a fetch failure message 
of the form {{FetchFailed(bmAddress, shuffleId, mapId, reduceId, 
failureMessage)}}.
* As of today's Spark master branch, the DAGScheduler handles this failure by 
marking that individual output as unavailable ( and by marking all outputs on 
that executor as unavailable.

**
 See 
https://github.com/apache/spark/blob/9b09101938399a3490c3c9bde9e5f07031140fdf/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1339
 and 
https://github.com/apache/spark/blob/9b09101938399a3490c3c9bde9e5f07031140fdf/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1346.
** As a shorthand, let's call this {{remove(shuffleId, mapId)}} followed by 
{{remove(blockManagerId)}}.

> Improve Scheduler fetch failures
> --------------------------------
>
>                 Key: SPARK-20178
>                 URL: https://issues.apache.org/jira/browse/SPARK-20178
>             Project: Spark
>          Issue Type: Epic
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> We have been having a lot of discussions around improving the handling of 
> fetch failures.  There are 4 jira currently related to this.  
> We should try to get a list of things we want to improve and come up with one 
> cohesive design.
> SPARK-20163,  SPARK-20091,  SPARK-14649 , and SPARK-19753
> I will put my initial thoughts in a follow on comment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to