[ https://issues.apache.org/jira/browse/SPARK-24387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508832#comment-16508832 ]
Jiang Xingbo commented on SPARK-24387: -------------------------------------- {quote}So I think there's a race condition that the backend may make offers before killing the executor. And since this is the only executor left, it's offered to the TaskScheduler and the retried task is scheduled to it.{quote} IIUC removing an executor due to heartbeat timeout will be treated as a SlaveLost, which shall encounter a taskFailure for each task running on that executor, and therefore blacklist the task from running again on that executor, so why can we offer the executor to the retried task again? > Heartbeat-timeout executor is added back and used again > ------------------------------------------------------- > > Key: SPARK-24387 > URL: https://issues.apache.org/jira/browse/SPARK-24387 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0 > Reporter: Rui Li > Priority: Major > > In our job, when there's only one task and one executor running, the > executor's heartbeat is lost and driver decides to remove it. However, the > executor is added again and the task's retry attempt is scheduled to that > executor, almost immediately after the executor is marked as lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org