[ https://issues.apache.org/jira/browse/SPARK-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Woody updated SPARK-8167: --------------------------------- Priority: Critical (was: Major) > Tasks that fail due to YARN preemption can cause job failure > ------------------------------------------------------------ > > Key: SPARK-8167 > URL: https://issues.apache.org/jira/browse/SPARK-8167 > Project: Spark > Issue Type: Bug > Components: Scheduler, YARN > Affects Versions: 1.3.1 > Reporter: Patrick Woody > Priority: Critical > > Tasks that are running on preempted executors will count as FAILED with an > ExecutorLostFailure. Unfortunately, this can quickly spiral out of control if > a large resource shift is occurring, and the tasks get scheduled to executors > that immediately get preempted as well. > The current workaround is to increase spark.task.maxFailures very high, but > that can cause delays in true failures. We should ideally differentiate these > task statuses so that they don't count towards the failure limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org