[
https://issues.apache.org/jira/browse/SPARK-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KaiXinXIaoLei updated SPARK-22760:
----------------------------------
Summary: where driver is stopping, and some executors lost because of
YarnSchedulerBackend.stop, then there is a problem. (was: where driver is
stopping, and some executors lost because of YarnSchedulerBackend.stop, then
there is a problem, )
> where driver is stopping, and some executors lost because of
> YarnSchedulerBackend.stop, then there is a problem.
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-22760
> URL: https://issues.apache.org/jira/browse/SPARK-22760
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, YARN
> Affects Versions: 2.2.1
> Reporter: KaiXinXIaoLei
> Attachments: 微信图片_20171212094100.jpg
>
>
> Use SPARK-14228 , i find a problem:
> 17/12/11 22:38:33 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Executor
> for container container_e02_1509517131757_0001_01_000002 exited because of a
> YARN event (e.g., pre-emption) and not because of an error in the running job.
> 17/12/11 22:38:33 ERROR YarnClientSchedulerBackend: Could not find
> CoarseGrainedScheduler or it has been stopped.
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it
> has been stopped.
> at
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
> at
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:128)
> at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:231)
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:515)
> at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62)
> at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:392)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:259)
> at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> I analysis this reason. When the number of executors is big, and
> YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is
> running, some executor is stoped, and YarnSchedulerBackend.onDisconnected()
> will be called, then the problem is exists
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]