[ https://issues.apache.org/jira/browse/SPARK-20658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001638#comment-16001638 ]
Marcelo Vanzin commented on SPARK-20658: ---------------------------------------- That's different... what version of Hadoop libraries is part of the Spark build? Generally there will be Hadoop jars in {{$SPARK_HOME/jars}}. Those are the ones that matter. (Alternatively, if you found - or did not find - the log message I mentioned in your logs, that would have answered these questions already.) > spark.yarn.am.attemptFailuresValidityInterval doesn't seem to have an effect > ---------------------------------------------------------------------------- > > Key: SPARK-20658 > URL: https://issues.apache.org/jira/browse/SPARK-20658 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.1.0 > Reporter: Paul Jones > Priority: Minor > > I'm running a job in YARN cluster mode using > `spark.yarn.am.attemptFailuresValidityInterval=1h` specified in both > spark-default.conf and in my spark-submit command. (This flag shows up in the > environment tab of spark history server, so it seems that it's specified > correctly). > However, I just had a job die with with four AM failures (three of the four > failures were over an hour apart). So, I'm confused as to what could be going > on. I haven't figured out the cause of the individual failures, so is it > possible that we always count certain types of failures? E.g. jobs that are > killed due to memory issues always count? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org