[ 
https://issues.apache.org/jira/browse/SPARK-37688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated SPARK-37688:
-----------------------------
    Description: 
When a executor was not alive, and `ExecutorMonitor` received late 
`SparkListenerBlockUpdated` event. The `onBlockUpdated` hander will call 
`ensureExecutorIsTracked`, which will create a new executor tracker with 
UNKNOWN_RESOURCE_PROFILE_ID for the dead executor. And 
`ExecutorAllocationManager` will not remove executor with 
UNKNOWN_RESOURCE_PROFILE_ID, which cause a executor slot is occupied by the 
dead executor, so a new one cannot be created . 

The ExecutorAllocationManager log was like this:
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!

  was:
When a executor was not alive, and ExecutorMonitor received late 
SparkListenerBlockUpdated event. The `onBlockUpdated` hander will call 
`ensureExecutorIsTracked`, which will create a new executor tracker with 
UNKNOWN_RESOURCE_PROFILE_ID for the dead executor. And 
ExecutorAllocationManager will not remove executor with 
UNKNOWN_RESOURCE_PROFILE_ID, which cause a executor slot is occupied by the 
dead executor, so a new one cannot be created . 

The ExecutorAllocationManager log was like this:
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!
21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
ExecutorAllocationManager: Not removing executor 34324 because the 
ResourceProfile was UNKNOWN!


> ExecutorMonitor should ignore SparkListenerBlockUpdated event if executor was 
> not active
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-37688
>                 URL: https://issues.apache.org/jira/browse/SPARK-37688
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.2
>            Reporter: hujiahua
>            Priority: Major
>
> When a executor was not alive, and `ExecutorMonitor` received late 
> `SparkListenerBlockUpdated` event. The `onBlockUpdated` hander will call 
> `ensureExecutorIsTracked`, which will create a new executor tracker with 
> UNKNOWN_RESOURCE_PROFILE_ID for the dead executor. And 
> `ExecutorAllocationManager` will not remove executor with 
> UNKNOWN_RESOURCE_PROFILE_ID, which cause a executor slot is occupied by the 
> dead executor, so a new one cannot be created . 
> The ExecutorAllocationManager log was like this:
> 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
> ExecutorAllocationManager: Not removing executor 34324 because the 
> ResourceProfile was UNKNOWN!
> 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
> ExecutorAllocationManager: Not removing executor 34324 because the 
> ResourceProfile was UNKNOWN!
> 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
> ExecutorAllocationManager: Not removing executor 34324 because the 
> ResourceProfile was UNKNOWN!
> 21/08/24 15:38:14 WARN [spark-dynamic-executor-allocation] 
> ExecutorAllocationManager: Not removing executor 34324 because the 
> ResourceProfile was UNKNOWN!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to