Repository: spark
Updated Branches:
  refs/heads/master 1e5fcdf96 -> ad615291f


[SPARK-13519][CORE] Driver should tell Executor to stop itself when cleaning 
executor's state

## What changes were proposed in this pull request?

When the driver removes an executor's state, the connection between the driver 
and the executor may be still alive so that the executor cannot exit 
automatically (E.g., Master will send RemoveExecutor when a work is lost but 
the executor is still alive), so the driver should try to tell the executor to 
stop itself. Otherwise, we will leak an executor.

This PR modified the driver to send `StopExecutor` to the executor when it's 
removed.

## How was this patch tested?

manual test: increase the worker heartbeat interval to force it's always 
timeout and the leak executors are gone.

Author: Shixiong Zhu <shixi...@databricks.com>

Closes #11399 from zsxwing/SPARK-13519.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ad615291
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ad615291
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ad615291

Branch: refs/heads/master
Commit: ad615291fe76580ee59e3f48f4efe4627a01409d
Parents: 1e5fcdf
Author: Shixiong Zhu <shixi...@databricks.com>
Authored: Fri Feb 26 15:11:57 2016 -0800
Committer: Andrew Or <and...@databricks.com>
Committed: Fri Feb 26 15:11:57 2016 -0800

----------------------------------------------------------------------
 .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala  | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ad615291/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
----------------------------------------------------------------------
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
index 0a5b09d..d151de5 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
@@ -179,6 +179,10 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
         context.reply(true)
 
       case RemoveExecutor(executorId, reason) =>
+        // We will remove the executor's state and cannot restore it. However, 
the connection
+        // between the driver and the executor may be still alive so that the 
executor won't exit
+        // automatically, so try to tell the executor to stop itself. See 
SPARK-13519.
+        
executorDataMap.get(executorId).foreach(_.executorEndpoint.send(StopExecutor))
         removeExecutor(executorId, reason)
         context.reply(true)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to