[ https://issues.apache.org/jira/browse/SPARK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902982#comment-14902982 ]
Sandy Ryza commented on SPARK-10739: ------------------------------------ That's the one I was referring to as well. That's about executor failure, where this is about AM failure, so different issues. > Add attempt window for long running Spark application on Yarn > ------------------------------------------------------------- > > Key: SPARK-10739 > URL: https://issues.apache.org/jira/browse/SPARK-10739 > Project: Spark > Issue Type: Improvement > Components: YARN > Reporter: Saisai Shao > Priority: Minor > > Currently Spark on Yarn uses max attempts to control the failure number, if > application's failure number reaches to the max attempts, application will > not be recovered by RM, it is not very effective for long running > applications, since it will easily exceed the max number at a long time > period, also setting a very large max attempts will hide the real problem. > So here introduce an attempt window to control the application attempt times, > this will ignore the out of window attempts, it is introduced in Hadoop 2.6+ > to support long running application, it is quite useful for Spark Streaming, > Spark shell like applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org