This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new e081f06ea401 [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown e081f06ea401 is described below commit e081f06ea401a2b6b8c214a36126583d35eaf55f Author: Josh Rosen <joshro...@databricks.com> AuthorDate: Wed Feb 21 13:35:32 2024 -0800 [SPARK-47121][CORE] Avoid RejectedExecutionExceptions during StandaloneSchedulerBackend shutdown ### What changes were proposed in this pull request? This PR adds logic to avoid uncaught `RejectedExecutionException`s while `StandaloneSchedulerBackend` is shutting down. When the backend is shut down, its `stop()` method calls `executorDelayRemoveThread.shutdownNow()`. After this point, though, it's possible that its `StandaloneDriverEndpoint` might still process `onDisconnected` events and those might trigger calls to schedule new tasks on the `executorDelayRemoveThread`. This causes uncaught `java.util.concurrent.RejectedExecutionException`s to be thrown in RPC threads. This patch adds a `try-catch` to catch those exceptions and log a short warning if the exceptions occur while the scheduler is stopping. This approach is consistent with other similar code in Spark, including: - https://github.com/apache/spark/blob/9b53b803998001e4b706666e37e5f86f900a7430/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala#L160-L163 - https://github.com/apache/spark/blob/9b53b803998001e4b706666e37e5f86f900a7430/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala#L754-L756 ### Why are the changes needed? Remove log and exception noise. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No new tests: it is difficult to reliably reproduce the scenario that leads to the log noise. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45203 from JoshRosen/reduce-scheduler-backend-shutdown-noise. Authored-by: Josh Rosen <joshro...@databricks.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- .../spark/scheduler/cluster/StandaloneSchedulerBackend.scala | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala index bd1cb164b4be..2150b996f058 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala @@ -18,7 +18,7 @@ package org.apache.spark.scheduler.cluster import java.util.Locale -import java.util.concurrent.{Semaphore, TimeUnit} +import java.util.concurrent.{RejectedExecutionException, Semaphore, TimeUnit} import java.util.concurrent.atomic.AtomicBoolean import scala.concurrent.Future @@ -343,8 +343,14 @@ private[spark] class StandaloneSchedulerBackend( } } } - executorDelayRemoveThread.schedule(removeExecutorTask, - _executorRemoveDelay, TimeUnit.MILLISECONDS) + try { + executorDelayRemoveThread.schedule(removeExecutorTask, + _executorRemoveDelay, TimeUnit.MILLISECONDS) + } catch { + case _: RejectedExecutionException if stopping.get() => + logWarning( + "Skipping onDisconnected RemoveExecutor call because the scheduler is stopping") + } } } } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org