[ https://issues.apache.org/jira/browse/SPARK-20996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038404#comment-16038404 ]
Apache Spark commented on SPARK-20996: -------------------------------------- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/18213 > Better handling AM reattempt based on exit code in yarn mode > ------------------------------------------------------------ > > Key: SPARK-20996 > URL: https://issues.apache.org/jira/browse/SPARK-20996 > Project: Spark > Issue Type: Improvement > Components: YARN > Affects Versions: 2.2.0 > Reporter: Saisai Shao > Priority: Minor > > Yarn provides max attempt configuration for applications running on it, > application has the chance to retry itself when failed. In the current Spark > code, no matter which failure AM occurred and if the failure doesn't reach to > the max attempt, RM will restart AM, this is not reasonable for some cases if > this issue is coming from AM itself, like user code failure, OOM, Spark > issue, executor failures, in large chance the reattempt of AM will meet this > issue again. Only when AM is failed due to external issue like crash, process > kill, NM failure, then AM should retry again. > So here propose to improve this reattempt mechanism to only retry when it > meets external issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org