GitHub user KaiXinXiaoLei opened a pull request:
https://github.com/apache/spark/pull/19951
[SPARK-22760][CORE][YARN] When sc.stop() is called, set stopped is true
before removing executors
## What changes were proposed in this pull request?
When the number of executors is big, and YarnSchedulerBackend.stop() is
runningï¼
before YarnSchedulerBackend.stopped=true, if some executor is stoped,
then YarnSchedulerBackend.onDisconnected() will be called. There is a problem
as follows:
{noformat}
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to
shut down
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
17/12/12 15:34:45 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or
it has been stopped.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
at
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
at
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
. So i change the code, when removing executor, check sc.isStopped in
YarnSchedulerBackend.onDisconnected(). if sc.isStopped=true, the message will
not be sent.
## How was this patch tested?
Run "spark-sql --master yarn -f query.sql" many times, the problem will be
exists.
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/KaiXinXiaoLei/spark pendingAdd11
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19951.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19951
commit c4dcc19ce8af02f99be18db8ddfe9b704086dd43
Author: hanghang <584620...@qq.com>
Date: 2017-12-11T23:53:52Z
change code
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org