[ 
https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15985447#comment-15985447
 ] 

Thomas Graves commented on SPARK-20178:
---------------------------------------

Another thing we should tie in here is handling preempted containers better. 
This kind of matches with my point above "Improve logic around deciding which 
node is actually bad when you get a fetch failures."  but a little bit of a 
special case.  If the containers gets preempted on the yarn side we need to 
properly detect that and not count that as a normal fetch failure. Right now 
that seems pretty difficult with the way we handle stage failures but I guess 
you would just line that up and not caught that as a normal stage failure.

> Improve Scheduler fetch failures
> --------------------------------
>
>                 Key: SPARK-20178
>                 URL: https://issues.apache.org/jira/browse/SPARK-20178
>             Project: Spark
>          Issue Type: Epic
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> We have been having a lot of discussions around improving the handling of 
> fetch failures.  There are 4 jira currently related to this.  
> We should try to get a list of things we want to improve and come up with one 
> cohesive design.
> SPARK-20163,  SPARK-20091,  SPARK-14649 , and SPARK-19753
> I will put my initial thoughts in a follow on comment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to