[
https://issues.apache.org/jira/browse/SPARK-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
KaiXinXIaoLei updated SPARK-22760:
----------------------------------
Description:
Using SPARK-14228 {{monospaced text}} , i still find a problem:
{noformat}
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut
down
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
17/12/12 15:34:45 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it
has been stopped.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
at
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
at
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
and sometimes, the below problem is also exists:
{noformat}
17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped
17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
17/12/11 15:50:53 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Unsupported message
OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container
container_e05_1512975871311_0007_01_000069 exited because of a YARN event
(e.g., pre-emption) and not because of an error in the running job.)) from
101.8.73.53:42930
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
at
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
at scala.util.Success.foreach(Try.scala:236)
at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
{noformat}
I analysis this reason. When the number of executors is big, and
YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is
running, some executor is stopped, and YarnSchedulerBackend.onDisconnected()
will be called, then the problem is exists
was:
Using SPARK-14228 , i still find a problem:
{noformat}
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut
down
17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
17/12/12 15:34:45 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it
has been stopped.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
at
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
at
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}
and sometimes, the below problem is also exists:
{noformat}
17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped
17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
17/12/11 15:50:53 ERROR Inbox: Ignoring error
org.apache.spark.SparkException: Unsupported message
OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container
container_e05_1512975871311_0007_01_000069 exited because of a YARN event
(e.g., pre-emption) and not because of an error in the running job.)) from
101.8.73.53:42930
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117)
at
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
at
org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186)
at
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
at
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
at scala.util.Success.foreach(Try.scala:236)
at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
{noformat}
I analysis this reason. When the number of executors is big, and
YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is
running, some executor is stopped, and YarnSchedulerBackend.onDisconnected()
will be called, then the problem is exists
> where driver is stopping, and some executors lost because of
> YarnSchedulerBackend.stop, then there is a problem.
> -----------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-22760
> URL: https://issues.apache.org/jira/browse/SPARK-22760
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, YARN
> Affects Versions: 2.2.1
> Reporter: KaiXinXIaoLei
> Attachments: 微信图片_20171212094100.jpg
>
>
> Using SPARK-14228 {{monospaced text}} , i still find a problem:
> {noformat}
> 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to
> shut down
> 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63.
> 17/12/12 15:34:45 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it
> has been stopped.
> at
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
> at
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133)
> at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
> at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497)
> at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
> at scala.Option.foreach(Option.scala:236)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
> at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> and sometimes, the below problem is also exists:
> {noformat}
> 17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped
> 17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 17/12/11 15:50:53 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Unsupported message
> OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container
> container_e05_1512975871311_0007_01_000069 exited because of a YARN event
> (e.g., pre-emption) and not because of an error in the running job.)) from
> 101.8.73.53:42930
> at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118)
> at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117)
> at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126)
> at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
> at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
> at
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
> at
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
> at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186)
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
> at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255)
> at scala.util.Success.foreach(Try.scala:236)
> at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
> at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206)
> {noformat}
> I analysis this reason. When the number of executors is big, and
> YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is
> running, some executor is stopped, and YarnSchedulerBackend.onDisconnected()
> will be called, then the problem is exists
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]