[ https://issues.apache.org/jira/browse/SPARK-37688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
hujiahua updated SPARK-37688: ----------------------------- Description: When a executor was not alive, and `ExecutorMonitor` received late `SparkListenerBlockUpdated` event. The `onBlockUpdated` hander will call `ensureExecutorIsTracked`, which will create a new executor tracker with UNKNOWN_RESOURCE_PROFILE_ID for the dead executor. And `ExecutorAllocationManager` will not remove executor with UNKNOWN_RESOURCE_PROFILE_ID, which cause a executor slot is occupied by the dead executor, so a new one cannot be created . The ExecutorAllocationManager log was like this: 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! was: When a executor was not alive, and ExecutorMonitor received late SparkListenerBlockUpdated event. The `onBlockUpdated` hander will call `ensureExecutorIsTracked`, which will create a new executor tracker with UNKNOWN_RESOURCE_PROFILE_ID for the dead executor. And ExecutorAllocationManager will not remove executor with UNKNOWN_RESOURCE_PROFILE_ID, which cause a executor slot is occupied by the dead executor, so a new one cannot be created . The ExecutorAllocationManager log was like this: 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] ExecutorAllocationManager: Not removing executor 34324 because the ResourceProfile was UNKNOWN! > ExecutorMonitor should ignore SparkListenerBlockUpdated event if executor was > not active > ---------------------------------------------------------------------------------------- > > Key: SPARK-37688 > URL: https://issues.apache.org/jira/browse/SPARK-37688 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.1.2 > Reporter: hujiahua > Priority: Major > > When a executor was not alive, and `ExecutorMonitor` received late > `SparkListenerBlockUpdated` event. The `onBlockUpdated` hander will call > `ensureExecutorIsTracked`, which will create a new executor tracker with > UNKNOWN_RESOURCE_PROFILE_ID for the dead executor. And > `ExecutorAllocationManager` will not remove executor with > UNKNOWN_RESOURCE_PROFILE_ID, which cause a executor slot is occupied by the > dead executor, so a new one cannot be created . > The ExecutorAllocationManager log was like this: > 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] > ExecutorAllocationManager: Not removing executor 34324 because the > ResourceProfile was UNKNOWN! > 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] > ExecutorAllocationManager: Not removing executor 34324 because the > ResourceProfile was UNKNOWN! > 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] > ExecutorAllocationManager: Not removing executor 34324 because the > ResourceProfile was UNKNOWN! > 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] > ExecutorAllocationManager: Not removing executor 34324 because the > ResourceProfile was UNKNOWN! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org