[ https://issues.apache.org/jira/browse/SPARK-27348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-27348: ------------------------------------ Assignee: Apache Spark > HeartbeatReceiver doesn't remove lost executors from > CoarseGrainedSchedulerBackend > ---------------------------------------------------------------------------------- > > Key: SPARK-27348 > URL: https://issues.apache.org/jira/browse/SPARK-27348 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Shixiong Zhu > Assignee: Apache Spark > Priority: Major > > When a heartbeat timeout happens in HeartbeatReceiver, it doesn't remove lost > executors from CoarseGrainedSchedulerBackend. When a connection of an > executor is not gracefully shut down, CoarseGrainedSchedulerBackend may not > receive a disconnect event. In this case, CoarseGrainedSchedulerBackend still > thinks a lost executor is still alive. CoarseGrainedSchedulerBackend may ask > TaskScheduler to run tasks on this lost executor. This task will never finish > and the job will hang forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org