[ https://issues.apache.org/jira/browse/SPARK-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
KaiXinXIaoLei updated SPARK-22760: ---------------------------------- Description: Use SPARK-14228 , i find a problem: ^ 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to shut down 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63. 17/12/12 15:34:45 ERROR Inbox: Ignoring error org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356) at org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ^ and sometimes, 17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped 17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/12/11 15:50:53 ERROR Inbox: Ignoring error org.apache.spark.SparkException: Unsupported message OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container container_e05_1512975871311_0007_01_000069 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job.)) from 101.8.73.53:42930 at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154) at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134) at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255) at scala.util.Success.foreach(Try.scala:236) at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206) at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206) I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected() will be called, then the problem is exists was: Use SPARK-14228 , i find a problem: 17/12/11 22:38:33 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Executor for container container_e02_1509517131757_0001_01_000002 exited because of a YARN event (e.g., pre-emption) and not because of an error in the running job. 17/12/11 22:38:33 ERROR YarnClientSchedulerBackend: Could not find CoarseGrainedScheduler or it has been stopped. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:128) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:231) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:515) at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:392) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:259) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected() will be called, then the problem is exists > where driver is stopping, and some executors lost because of > YarnSchedulerBackend.stop, then there is a problem. > ----------------------------------------------------------------------------------------------------------------- > > Key: SPARK-22760 > URL: https://issues.apache.org/jira/browse/SPARK-22760 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN > Affects Versions: 2.2.1 > Reporter: KaiXinXIaoLei > Attachments: 微信图片_20171212094100.jpg > > > Use SPARK-14228 , i find a problem: > ^ 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Asking each executor to > shut down > 17/12/12 15:34:45 INFO YarnClientSchedulerBackend: Disabling executor 63. > 17/12/12 15:34:45 ERROR Inbox: Ignoring error > org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it > has been stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:133) > at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:356) > at > org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:497) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:301) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:217) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ^ > and sometimes, > 17/12/11 15:50:53 INFO YarnClientSchedulerBackend: Stopped > 17/12/11 15:50:53 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 17/12/11 15:50:53 ERROR Inbox: Ignoring error > org.apache.spark.SparkException: Unsupported message > OneWayMessage(101.8.73.53:42930,RemoveExecutor(68,Executor for container > container_e05_1512975871311_0007_01_000069 exited because of a YARN event > (e.g., pre-emption) and not because of an error in the running job.)) from > 101.8.73.53:42930 > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:118) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1$$anonfun$apply$mcV$sp$2.apply(Inbox.scala:117) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$receive$1.applyOrElse(CoarseGrainedSchedulerBackend.scala:126) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134) > at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:186) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255) > at > org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$org$apache$spark$scheduler$cluster$YarnSchedulerBackend$$handleExecutorDisconnectedFromDriver$1.apply(YarnSchedulerBackend.scala:255) > at scala.util.Success.foreach(Try.scala:236) > at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206) > at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:206) > I analysis this reason. When the number of executors is big, and > YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is > running, some executor is stoped, and YarnSchedulerBackend.onDisconnected() > will be called, then the problem is exists -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org