[jira] [Commented] (SPARK-24387) Heartbeat-timeout executor is added back and used again

Jiang Xingbo (JIRA) Mon, 11 Jun 2018 15:05:25 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508832#comment-16508832
 ]


Jiang Xingbo commented on SPARK-24387:
--------------------------------------

{quote}So I think there's a race condition that the backend may make offers 
before killing the executor. And since this is the only executor left, it's 
offered to the TaskScheduler and the retried task is scheduled to it.{quote}
IIUC removing an executor due to heartbeat timeout will be treated as a 
SlaveLost, which shall encounter a taskFailure for each task running on that 
executor, and therefore blacklist the task from running again on that executor, 
so why can we offer the executor to the retried task again?

> Heartbeat-timeout executor is added back and used again
> -------------------------------------------------------
>
>                 Key: SPARK-24387
>                 URL: https://issues.apache.org/jira/browse/SPARK-24387
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Rui Li
>            Priority: Major
>
> In our job, when there's only one task and one executor running, the 
> executor's heartbeat is lost and driver decides to remove it. However, the 
> executor is added again and the task's retry attempt is scheduled to that 
> executor, almost immediately after the executor is marked as lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24387) Heartbeat-timeout executor is added back and used again

Reply via email to