[ https://issues.apache.org/jira/browse/SPARK-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901866#comment-14901866 ]
Sean Owen commented on SPARK-10739: ----------------------------------- I'm sure we've discussed this one before and that there's a JIRA for it ... but can't for the life of me find it. I feel like [~sandyr] or [~vanzin] commented on it. the question was how long back you looked when considering if "a lot" of failures had occurred, etc. > Add attempt window for long running Spark application on Yarn > ------------------------------------------------------------- > > Key: SPARK-10739 > URL: https://issues.apache.org/jira/browse/SPARK-10739 > Project: Spark > Issue Type: Improvement > Components: YARN > Reporter: Saisai Shao > Priority: Minor > > Currently Spark on Yarn uses max attempts to control the failure number, if > application's failure number reaches to the max attempts, application will > not be recovered by RM, it is not very effective for long running > applications, since it will easily exceed the max number at a long time > period, also setting a very large max attempts will hide the real problem. > So here introduce an attempt window to control the application attempt times, > this will ignore the out of window attempts, it is introduced in Hadoop 2.6+ > to support long running application, it is quite useful for Spark Streaming, > Spark shell like applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org