[ 
https://issues.apache.org/jira/browse/SPARK-52752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Wang updated SPARK-52752:
------------------------------
    Description: 
I found some failed tasks with the reason "executor 260 exited unrelated to the 
running tasks", but from the executor log I saw that it had run successfully.

!image-2025-07-10-18-21-57-192.png!

I sorted out the relevant logs of this issue:

Executor 260 task scheduler logs:
{code:java}
25/07/10 17:54:14 INFO Executor: Finished task 193.0 in stage 462.0 (TID 
23745). 187203 bytes result sent to driver
25/07/10 17:55:08 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 25061
25/07/10 17:55:08 INFO Executor: Running task 274.0 in stage 639.0 (TID 25061)
......
25/07/10 17:55:14 INFO MemoryStore: Block taskresult_25061 stored as bytes in 
memory (estimated size 1821.7 KiB, free 1477.0 MiB)
25/07/10 17:55:14 INFO Executor: Finished task 274.0 in stage 639.0 (TID 
25061). 1865404 bytes result sent via BlockManager)
25/07/10 17:55:17 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
25/07/10 17:55:17 INFO MemoryStore: MemoryStore cleared
25/07/10 17:55:17 INFO BlockManager: BlockManager stopped
25/07/10 17:55:17 INFO ShutdownHookManager: Shutdown hook called {code}
AsyncEventQueue drops some Events:
{code:java}
25/07/10 17:53:15 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
This likely means one of the listeners is too slow and cannot keep up with the 
rate at which tasks are being started by the scheduler. {code}
The last ExecutorMonitor$Tracker log in Driver:
{code:java}
25/07/10 17:54:16 INFO ExecutorMonitor$Tracker: Updating timeout for executor 
260, delta: -1
25/07/10 17:54:16 INFO ExecutorMonitor$Tracker: Updating timeout for executor 
260 to 100306447198500258 ns {code}
Executor 260 killed due to idle timedout log in Driver:

 
{code:java}
25/07/10 17:55:16 INFO YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 260, 423
25/07/10 17:55:16 INFO YarnClusterSchedulerBackend: Actual list of executor(s) 
to be killed is 260
25/07/10 17:55:16 INFO ApplicationMaster$AMEndpoint: Driver requested to kill 
executor(s) 260.
25/07/10 17:55:16 INFO ExecutorAllocationManager: Executors 260 removed due to 
idle timeout. {code}
 

 

  was:
I found some failed tasks with the reason "executor 260 exited unrelated to the 
running tasks", but from the executor log I saw that it had run successfully.

!image-2025-07-10-18-21-36-377.png!

 

 


> Executor may be killed before task is finished due to DRA idle timedout
> -----------------------------------------------------------------------
>
>                 Key: SPARK-52752
>                 URL: https://issues.apache.org/jira/browse/SPARK-52752
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.0, 4.0.1
>            Reporter: Zhen Wang
>            Priority: Major
>         Attachments: image-2025-07-10-18-21-36-377.png, 
> image-2025-07-10-18-21-57-192.png
>
>
> I found some failed tasks with the reason "executor 260 exited unrelated to 
> the running tasks", but from the executor log I saw that it had run 
> successfully.
> !image-2025-07-10-18-21-57-192.png!
> I sorted out the relevant logs of this issue:
> Executor 260 task scheduler logs:
> {code:java}
> 25/07/10 17:54:14 INFO Executor: Finished task 193.0 in stage 462.0 (TID 
> 23745). 187203 bytes result sent to driver
> 25/07/10 17:55:08 INFO YarnCoarseGrainedExecutorBackend: Got assigned task 
> 25061
> 25/07/10 17:55:08 INFO Executor: Running task 274.0 in stage 639.0 (TID 25061)
> ......
> 25/07/10 17:55:14 INFO MemoryStore: Block taskresult_25061 stored as bytes in 
> memory (estimated size 1821.7 KiB, free 1477.0 MiB)
> 25/07/10 17:55:14 INFO Executor: Finished task 274.0 in stage 639.0 (TID 
> 25061). 1865404 bytes result sent via BlockManager)
> 25/07/10 17:55:17 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
> 25/07/10 17:55:17 INFO MemoryStore: MemoryStore cleared
> 25/07/10 17:55:17 INFO BlockManager: BlockManager stopped
> 25/07/10 17:55:17 INFO ShutdownHookManager: Shutdown hook called {code}
> AsyncEventQueue drops some Events:
> {code:java}
> 25/07/10 17:53:15 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
> This likely means one of the listeners is too slow and cannot keep up with 
> the rate at which tasks are being started by the scheduler. {code}
> The last ExecutorMonitor$Tracker log in Driver:
> {code:java}
> 25/07/10 17:54:16 INFO ExecutorMonitor$Tracker: Updating timeout for executor 
> 260, delta: -1
> 25/07/10 17:54:16 INFO ExecutorMonitor$Tracker: Updating timeout for executor 
> 260 to 100306447198500258 ns {code}
> Executor 260 killed due to idle timedout log in Driver:
>  
> {code:java}
> 25/07/10 17:55:16 INFO YarnClusterSchedulerBackend: Requesting to kill 
> executor(s) 260, 423
> 25/07/10 17:55:16 INFO YarnClusterSchedulerBackend: Actual list of 
> executor(s) to be killed is 260
> 25/07/10 17:55:16 INFO ApplicationMaster$AMEndpoint: Driver requested to kill 
> executor(s) 260.
> 25/07/10 17:55:16 INFO ExecutorAllocationManager: Executors 260 removed due 
> to idle timeout. {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to