[ 
https://issues.apache.org/jira/browse/SPARK-20996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20996:
------------------------------------

    Assignee:     (was: Apache Spark)

> Better handling AM reattempt based on exit code in yarn mode
> ------------------------------------------------------------
>
>                 Key: SPARK-20996
>                 URL: https://issues.apache.org/jira/browse/SPARK-20996
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.2.0
>            Reporter: Saisai Shao
>            Priority: Minor
>
> Yarn provides max attempt configuration for applications running on it, 
> application has the chance to retry itself when failed. In the current Spark 
> code, no matter which failure AM occurred and if the failure doesn't reach to 
> the max attempt, RM will restart AM, this is not reasonable for some cases if 
> this issue is coming from AM itself, like user code failure, OOM, Spark 
> issue, executor failures, in large chance the reattempt of AM will meet this 
> issue again. Only when AM is failed due to external issue like crash, process 
> kill, NM failure, then AM should retry again.
> So here propose to improve this reattempt mechanism to only retry when it 
> meets external issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to