[ https://issues.apache.org/jira/browse/SPARK-20658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001917#comment-16001917 ]
Saisai Shao commented on SPARK-20658: ------------------------------------- It is mainly depends on YARN to measure the failure validity interval and how to define a failure AM, Spark just proxy this parameter to YARN. So if there's any unexpected behavior I think we should investigate on YARN part to see the actual behavior. > spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect > ---------------------------------------------------------------------------- > > Key: SPARK-20658 > URL: https://issues.apache.org/jira/browse/SPARK-20658 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.1.0 > Reporter: Paul Jones > Priority: Minor > > I'm running a job in YARN cluster mode using > `spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both > spark-default.conf and in my spark-submit command. (This flag shows up in the > environment tab of spark history server, so it seems that it's specified > correctly). > However, I just had a job die with with four AM failures (three of the four > failures were over an hour apart). So, I'm confused as to what could be going > on. I haven't figured out the cause of the individual failures, so is it > possible that we always count certain types of failures? E.g. jobs that are > killed due to memory issues always count? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org