[ 
https://issues.apache.org/jira/browse/SPARK-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185005#comment-15185005
 ] 

Jakub Dubovsky commented on SPARK-8167:
---------------------------------------

I'd like to have a question about this fix. In SparkUI in active stages view 
there is an entry saying this:

Tasks: Succeeded/Total
1480/2880 (1311 failed)

Does this number of failed tasks include those which "failed" because of 
preemption?
It's useful to know whether my job is failing (I should fix it) or resources 
are taken only (I should wait).

Thank you

> Tasks that fail due to YARN preemption can cause job failure
> ------------------------------------------------------------
>
>                 Key: SPARK-8167
>                 URL: https://issues.apache.org/jira/browse/SPARK-8167
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler, YARN
>    Affects Versions: 1.3.1
>            Reporter: Patrick Woody
>            Assignee: Matt Cheah
>            Priority: Blocker
>             Fix For: 1.6.0
>
>
> Tasks that are running on preempted executors will count as FAILED with an 
> ExecutorLostFailure. Unfortunately, this can quickly spiral out of control if 
> a large resource shift is occurring, and the tasks get scheduled to executors 
> that immediately get preempted as well.
> The current workaround is to increase spark.task.maxFailures very high, but 
> that can cause delays in true failures. We should ideally differentiate these 
> task statuses so that they don't count towards the failure limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to