[ 
https://issues.apache.org/jira/browse/SPARK-27228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukas Waldmann updated SPARK-27228:
-----------------------------------
    Description: When using dynamic allocations after all jobs finishes spark 
delays for several minutes before finally finishes. Log suggest that executors 
are not cleared up properly.  (was: When using dynamic allocations after all 
jobs finishes spark delays for several minutes before finally finishes. Log 
suggest that executors are not cleared up properly.
{quote}{{19/03/21 09:51:38 INFO SparkSession: PROCESSING FINISHED 19/03/21 
09:51:38 INFO ExecutorAllocationManager: Request to remove executorIds: 355 
19/03/21 09:51:38 INFO YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 355 19/03/21 09:51:38 INFO YarnClusterSchedulerBackend: Actual list 
of executor(s) to be killed is 355 19/03/21 09:51:38 INFO 
ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 355. 
19/03/21 09:51:38 INFO ExecutorAllocationManager: Removing executor 355 because 
it has been idle for 60 seconds (new desired total will be 65) 19/03/21 
09:51:38 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 228. 
19/03/21 09:51:38 INFO DAGScheduler: Executor lost: 228 (epoch 446) 19/03/21 
09:51:38 INFO BlockManagerMasterEndpoint: Trying to remove executor 228 from 
BlockManagerMaster. 19/03/21 09:51:38 INFO BlockManagerMasterEndpoint: Removing 
block manager BlockManagerId(228, data-15.bdp.gin.merck.com, 45882, None) 
19/03/21 09:51:38 INFO BlockManagerMaster: Removed 228 successfully in 
removeExecutor 19/03/21 09:51:38 INFO SparkUI: Stopped Spark web UI at 
[http://data-04.bdp.gin.merck.com:44304|http://data-04.bdp.gin.merck.com:44304/]
 19/03/21 09:51:38 INFO YarnClusterScheduler: Executor 228 on 
data-15.bdp.gin.merck.com killed by driver. 19/03/21 09:51:38 INFO 
YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 346. 19/03/21 
09:51:38 INFO DAGScheduler: Executor lost: 346 (epoch 446) 19/03/21 09:51:38 
INFO BlockManagerMasterEndpoint: Trying to remove executor 346 from 
BlockManagerMaster. 19/03/21 09:51:38 INFO BlockManagerMasterEndpoint: Removing 
block manager BlockManagerId(346, datanode-02.bdp.gin.merck.com, 41186, None) 
19/03/21 09:51:38 INFO BlockManagerMaster: Removed 346 successfully in 
removeExecutor 19/03/21 09:51:38 INFO YarnClusterScheduler: Executor 346 on 
datanode-02.bdp.gin.merck.com killed by driver. 19/03/21 09:51:39 INFO 
YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 332. 19/03/21 
09:51:39 INFO DAGScheduler: Executor lost: 332 (epoch 446) 19/03/21 09:51:39 
INFO BlockManagerMasterEndpoint: Trying to remove executor 332 from 
BlockManagerMaster. 19/03/21 09:51:39 INFO BlockManagerMasterEndpoint: Removing 
block manager BlockManagerId(332, data-10.bdp.gin.merck.com, 38713, None) 
19/03/21 09:51:39 INFO BlockManagerMaster: Removed 332 successfully in 
removeExecutor 19/03/21 09:51:39 INFO YarnClusterScheduler: Executor 332 on 
data-10.bdp.gin.merck.com killed by driver. 19/03/21 09:51:39 INFO 
YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 240. 19/03/21 
09:51:39 INFO YarnClusterScheduler: Executor 240 on data-22.bdp.gin.merck.com 
killed by driver. 19/03/21 09:51:39 INFO DAGScheduler: Executor lost: 240 
(epoch 446) 19/03/21 09:51:39 INFO BlockManagerMasterEndpoint: Trying to remove 
executor 240 from BlockManagerMaster. 19/03/21 09:51:39 INFO 
BlockManagerMasterEndpoint: Removing block manager BlockManagerId(240, 
data-22.bdp.gin.merck.com, 43344, None) 19/03/21 09:51:39 INFO 
BlockManagerMaster: Removed 240 successfully in removeExecutor 19/03/21 
09:51:39 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 327. 
19/03/21 09:51:39 INFO DAGScheduler: Executor lost: 327 (epoch 446) 19/03/21 
09:51:39 INFO BlockManagerMasterEndpoint: Trying to remove executor 327 from 
BlockManagerMaster. 19/03/21 09:51:39 INFO BlockManagerMasterEndpoint: Removing 
block manager BlockManagerId(327, data-20.bdp.gin.merck.com, 34235, None) 
19/03/21 09:51:39 INFO YarnClusterScheduler: Executor 327 on 
data-20.bdp.gin.merck.com killed by driver. 19/03/21 09:51:39 INFO 
BlockManagerMaster: Removed 327 successfully in removeExecutor 19/03/21 
09:51:39 INFO YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 355. 
19/03/21 09:51:39 INFO YarnClusterScheduler: Executor 355 on 
data-20.bdp.gin.merck.com killed by driver. 19/03/21 09:51:39 INFO 
DAGScheduler: Executor lost: 355 (epoch 446) 19/03/21 09:51:39 INFO 
BlockManagerMasterEndpoint: Trying to remove executor 355 from 
BlockManagerMaster. 19/03/21 09:51:39 INFO BlockManagerMasterEndpoint: Removing 
block manager BlockManagerId(355, data-20.bdp.gin.merck.com, 43141, None) 
19/03/21 09:51:39 INFO BlockManagerMaster: Removed 355 successfully in 
removeExecutor 19/03/21 09:51:39 INFO YarnSchedulerBackend$YarnDriverEndpoint: 
Disabling executor 168. 19/03/21 09:51:39 INFO DAGScheduler: Executor lost: 168 
(epoch 446) 19/03/21 09:51:39 INFO YarnClusterScheduler: Executor 168 on 
data-07.bdp.gin.merck.com killed by driver. 19/03/21 09:51:39 INFO 
BlockManagerMasterEndpoint: Trying to remove executor 168 from 
BlockManagerMaster. 19/03/21 09:51:39 INFO BlockManagerMasterEndpoint: Removing 
block manager BlockManagerId(168, data-07.bdp.gin.merck.com, 44833, None) 
19/03/21 09:51:39 INFO BlockManagerMaster: Removed 168 successfully in 
removeExecutor 19/03/21 09:54:26 WARN HeartbeatReceiver: Removing executor 332 
with no recent heartbeats: 173942 ms exceeds timeout 120000 ms 19/03/21 
09:54:26 ERROR YarnClusterScheduler: Lost an executor 332 (already removed): 
Executor heartbeat timed out after 173942 ms 19/03/21 09:54:26 WARN 
HeartbeatReceiver: Removing executor 346 with no recent heartbeats: 172853 ms 
exceeds timeout 120000 ms 19/03/21 09:54:26 ERROR YarnClusterScheduler: Lost an 
executor 346 (already removed): Executor heartbeat timed out after 172853 ms 
19/03/21 09:54:26 WARN HeartbeatReceiver: Removing executor 355 with no recent 
heartbeats: 173169 ms exceeds timeout 120000 ms 19/03/21 09:54:26 ERROR 
YarnClusterScheduler: Lost an executor 355 (already removed): Executor 
heartbeat timed out after 173169 ms 19/03/21 09:54:26 WARN HeartbeatReceiver: 
Removing executor 168 with no recent heartbeats: 174129 ms exceeds timeout 
120000 ms 19/03/21 09:54:26 ERROR YarnClusterScheduler: Lost an executor 168 
(already removed): Executor heartbeat timed out after 174129 ms 19/03/21 
09:54:26 WARN HeartbeatReceiver: Removing executor 327 with no recent 
heartbeats: 169555 ms exceeds timeout 120000 ms 19/03/21 09:54:26 ERROR 
YarnClusterScheduler: Lost an executor 327 (already removed): Executor 
heartbeat timed out after 169555 ms 19/03/21 09:54:26 WARN HeartbeatReceiver: 
Removing executor 240 with no recent heartbeats: 177937 ms exceeds timeout 
120000 ms 19/03/21 09:54:26 ERROR YarnClusterScheduler: Lost an executor 240 
(already removed): Executor heartbeat timed out after 177937 ms 19/03/21 
09:54:26 WARN HeartbeatReceiver: Removing executor 228 with no recent 
heartbeats: 178171 ms exceeds timeout 120000 ms 19/03/21 09:54:26 ERROR 
YarnClusterScheduler: Lost an executor 228 (already removed): Executor 
heartbeat timed out after 178171 ms 19/03/21 09:54:26 INFO 
YarnClusterSchedulerBackend: Requesting to kill executor(s) 332 19/03/21 
09:54:26 WARN YarnClusterSchedulerBackend: Executor to kill 332 does not exist! 
19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Actual list of executor(s) 
to be killed is 19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Requesting 
to kill executor(s) 346 19/03/21 09:54:26 WARN YarnClusterSchedulerBackend: 
Executor to kill 346 does not exist! 19/03/21 09:54:26 INFO 
YarnClusterSchedulerBackend: Actual list of executor(s) to be killed is 
19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 355 19/03/21 09:54:26 WARN YarnClusterSchedulerBackend: Executor to 
kill 355 does not exist! 19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: 
Actual list of executor(s) to be killed is 19/03/21 09:54:26 INFO 
YarnClusterSchedulerBackend: Requesting to kill executor(s) 168 19/03/21 
09:54:26 WARN YarnClusterSchedulerBackend: Executor to kill 168 does not exist! 
19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Actual list of executor(s) 
to be killed is 19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Requesting 
to kill executor(s) 327 19/03/21 09:54:26 WARN YarnClusterSchedulerBackend: 
Executor to kill 327 does not exist! 19/03/21 09:54:26 INFO 
YarnClusterSchedulerBackend: Actual list of executor(s) to be killed is 
19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Requesting to kill 
executor(s) 240 19/03/21 09:54:26 WARN YarnClusterSchedulerBackend: Executor to 
kill 240 does not exist! 19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: 
Actual list of executor(s) to be killed is 19/03/21 09:54:26 INFO 
YarnClusterSchedulerBackend: Requesting to kill executor(s) 228 19/03/21 
09:54:26 WARN YarnClusterSchedulerBackend: Executor to kill 228 does not exist! 
19/03/21 09:54:26 INFO YarnClusterSchedulerBackend: Actual list of executor(s) 
to be killed is 19/03/21 09:56:43 INFO YarnClusterSchedulerBackend: Shutting 
down all executors 19/03/21 09:56:43 INFO 
YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 
19/03/21 09:56:43 INFO SchedulerExtensionServices: Stopping 
SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 
19/03/21 09:56:43 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped! 19/03/21 09:56:43 INFO MemoryStore: 
MemoryStore cleared 19/03/21 09:56:43 INFO BlockManager: BlockManager stopped 
19/03/21 09:56:43 INFO BlockManagerMaster: BlockManagerMaster stopped 19/03/21 
09:56:43 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped! 19/03/21 09:56:43 INFO SparkContext: 
Successfully stopped SparkContext 19/03/21 09:56:43 INFO ApplicationMaster: 
Final app status: SUCCEEDED, exitCode: 0 19/03/21 }}
{quote})

> Spark long delay on close, possible problem with killing executors
> ------------------------------------------------------------------
>
>                 Key: SPARK-27228
>                 URL: https://issues.apache.org/jira/browse/SPARK-27228
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 2.3.0
>            Reporter: Lukas Waldmann
>            Priority: Major
>         Attachments: log.html
>
>
> When using dynamic allocations after all jobs finishes spark delays for 
> several minutes before finally finishes. Log suggest that executors are not 
> cleared up properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to