[ https://issues.apache.org/jira/browse/SPARK-20266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jerry.X.He updated SPARK-20266: ------------------------------- Docs Text: I have been sturtured an cluster of Spark tomorrow, and today I want to run an WordCount program on cluster my environment : jdk1.8.121 + scala2.10.4 + hadoop 2.6.5 + spark1.6.2 cluster : master + slave01 + slave02 client : client demo code as below : def main(args :Array[String]) = { val inputPath = "hdfs://master/970655147/input/01WordCount/" // 1. localMode // val conf = new SparkConf().setMaster("local").setAppName("WordCount") // 2. standaloneMode val conf = new SparkConf().setMaster("spark://master:7077").setAppName("WordCount") .set("spark.executor.memory", "64M") .set("spark.executor.cores", "1") val sc = new SparkContext(conf) val line = sc.textFile(inputPath) line.foreach(println) sc.stop } first, I use localMode, then program run normally second, I run it on cluster[deploy in idea], but failed, it seems there are interactive blocked, according to the log, Master allocate Executor for app, and then Executor run, but it seems blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643); SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)", so executor not register for driver, and then driver got none resources and next I tried use spark-submit[spark-submit --master spark://master:7077 --class com.hx.test.Test01WordCount HelloSpark.jar] on client, or slave02, but the result are the same please give me some advice, thx some log info as follw : -------------------------------------------------------------------------------------------- 1. ExecutorBackend's bootstrap cmd root@slave02:~# jps 7984 CoarseGrainedExecutorBackend 6468 NodeManager 8037 Jps 955 Worker 7981 CoarseGrainedExecutorBackend 7982 CoarseGrainedExecutorBackend 6366 DataNode 7983 CoarseGrainedExecutorBackend root@slave02:~# ps -ef | grep 7983 root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 root@slave02:~# -------------------------------------------------------------------------------------------- 2. executor's error log root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/04/08 06:23:29 INFO Remoting: Starting remoting Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:179) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# -------------------------------------------------------------------------------------------- 3. driver's log Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 17/04/08 21:21:47 INFO Remoting: Starting remoting 17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974] 17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077... 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more -------------------------------------------------------------------------------------------- was: I have been sturtured an cluster of Spark tomorrow, and today I want to run an WordCount program on cluster my environment : jdk1.8.121 + scala2.10.4 + hadoop 2.6.5 + spark1.6.2 cluster : master + slave01 + slave02 client : client demo code as below : def main(args :Array[String]) = { val inputPath = "hdfs://master/970655147/input/01WordCount/" // 1. localMode // val conf = new SparkConf().setMaster("local").setAppName("WordCount") // 2. standaloneMode val conf = new SparkConf().setMaster("spark://master:7077").setAppName("WordCount") .set("spark.executor.memory", "64M") .set("spark.executor.cores", "1") val sc = new SparkContext(conf) val line = sc.textFile(inputPath) line.foreach(println) sc.stop } first, I use localMode, then program run normally second, I run it on cluster[deploy in idea], but failed, it seems there are interactive blocked, according to the log, Master allocate Executor for app, and then Executor run, but it seems blocked at "UserGroupInformation.doAs(UserGroupInformation.java:1643); SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)" and next I tried use spark-submit[spark-submit --master spark://master:7077 --class com.hx.test.Test01WordCount HelloSpark.jar] on client, or slave02, but the result are the same please give me some advice, thx some log info as follw : -------------------------------------------------------------------------------------------- 1. ExecutorBackend's bootstrap cmd root@slave02:~# jps 7984 CoarseGrainedExecutorBackend 6468 NodeManager 8037 Jps 955 Worker 7981 CoarseGrainedExecutorBackend 7982 CoarseGrainedExecutorBackend 6366 DataNode 7983 CoarseGrainedExecutorBackend root@slave02:~# ps -ef | grep 7983 root 7983 955 14 06:21 ? 00:00:03 /usr/local/ProgramFiles/jdk1.8.0_121/bin/java -cp /usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/conf/:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/spark-assembly-1.6.2-hadoop2.6.0.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/ProgramFiles/hadoop-2.6.5/etc/hadoop/ -Xms64M -Xmx64M -Dspark.driver.port=37230 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@192.168.0.191:37230 --executor-id 1 --hostname 192.168.0.182 --cores 1 --app-id app-20170408062155-0015 --worker-url spark://Worker@192.168.0.182:46466 root 8050 4249 4 06:22 pts/1 00:00:00 grep --color=auto 7983 root@slave02:~# -------------------------------------------------------------------------------------------- 2. executor's error log root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stderr 17/04/08 06:22:20 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/04/08 06:22:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 06:22:28 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:22:28 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:06 INFO spark.SecurityManager: Changing view acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: Changing modify acls to: root 17/04/08 06:23:06 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 06:23:24 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/04/08 06:23:29 INFO Remoting: Starting remoting Exception in thread "main" 17/04/08 06:23:46 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/04/08 06:23:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:151) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:253) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at akka.remote.Remoting.start(Remoting.scala:179) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634) at akka.actor.ActorSystem$.apply(ActorSystem.scala:142) at akka.actor.ActorSystem$.apply(ActorSystem.scala:119) at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53) at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52) at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2024) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2015) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:266) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:217) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:186) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# cat work/app-20170408062155-0015/0/stdout root@slave02:/usr/local/ProgramFiles/spark-1.6.2-bin-hadoop2.6# -------------------------------------------------------------------------------------------- 3. driver's log Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/04/08 21:21:44 INFO SparkContext: Running Spark version 1.6.2 17/04/08 21:21:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/08 21:21:45 INFO SecurityManager: Changing view acls to: root 17/04/08 21:21:45 INFO SecurityManager: Changing modify acls to: root 17/04/08 21:21:45 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 17/04/08 21:21:46 INFO Utils: Successfully started service 'sparkDriver' on port 37230. 17/04/08 21:21:47 INFO Slf4jLogger: Slf4jLogger started 17/04/08 21:21:47 INFO Remoting: Starting remoting 17/04/08 21:21:48 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.0.191:43974] 17/04/08 21:21:48 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 43974. 17/04/08 21:21:48 INFO SparkEnv: Registering MapOutputTracker 17/04/08 21:21:48 INFO SparkEnv: Registering BlockManagerMaster 17/04/08 21:21:48 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ef79b656-b7f4-4cb3-be3e-0f8bb61baa9d 17/04/08 21:21:48 INFO MemoryStore: MemoryStore started with capacity 431.3 MB 17/04/08 21:21:48 INFO SparkEnv: Registering OutputCommitCoordinator 17/04/08 21:21:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/04/08 21:21:54 INFO SparkUI: Started SparkUI at http://192.168.0.191:4040 17/04/08 21:21:54 INFO AppClient$ClientEndpoint: Connecting to master spark://master:7077... 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20170408062155-0015 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/0 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/0 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/1 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/1 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/2 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/2 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/3 on worker-20170408024004-192.168.0.182-46466 (192.168.0.182:46466) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/3 on hostPort 192.168.0.182:46466 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/4 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/4 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/5 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/5 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/6 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/6 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO AppClient$ClientEndpoint: Executor added: app-20170408062155-0015/7 on worker-20170408024003-192.168.0.181-45183 (192.168.0.181:45183) with 1 cores 17/04/08 21:21:55 INFO SparkDeploySchedulerBackend: Granted executor ID app-20170408062155-0015/7 on hostPort 192.168.0.181:45183 with 1 cores, 64.0 MB RAM 17/04/08 21:21:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42255. 17/04/08 21:21:55 INFO NettyBlockTransferService: Server created on 42255 17/04/08 21:21:56 INFO BlockManagerMaster: Trying to register BlockManager 17/04/08 21:21:57 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.191:42255 with 431.3 MB RAM, BlockManagerId(driver, 192.168.0.191, 42255) 17/04/08 21:21:57 INFO BlockManagerMaster: Registered BlockManager 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/0 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/2 is now RUNNING 17/04/08 21:21:58 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/3 is now RUNNING 17/04/08 21:22:00 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/4 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/5 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/6 is now RUNNING 17/04/08 21:22:01 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/7 is now RUNNING 17/04/08 21:22:03 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 17/04/08 21:22:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 107.7 KB) 17/04/08 21:22:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.8 KB, free 117.5 KB) 17/04/08 21:22:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.191:42255 (size: 9.8 KB, free: 431.2 MB) 17/04/08 21:22:06 INFO SparkContext: Created broadcast 0 from textFile at Test01WordCount.scala:30 17/04/08 21:22:21 INFO FileInputFormat: Total input paths to process : 1 17/04/08 21:22:21 INFO SparkContext: Starting job: foreach at Test01WordCount.scala:33 17/04/08 21:22:21 INFO DAGScheduler: Got job 0 (foreach at Test01WordCount.scala:33) with 2 output partitions 17/04/08 21:22:21 INFO DAGScheduler: Final stage: ResultStage 0 (foreach at Test01WordCount.scala:33) 17/04/08 21:22:21 INFO DAGScheduler: Parents of final stage: List() 17/04/08 21:22:21 INFO DAGScheduler: Missing parents: List() 17/04/08 21:22:21 INFO DAGScheduler: Submitting ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30), which has no missing parents 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.0 KB, free 120.5 KB) 17/04/08 21:22:21 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1842.0 B, free 122.3 KB) 17/04/08 21:22:21 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.0.191:42255 (size: 1842.0 B, free: 431.2 MB) 17/04/08 21:22:21 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 17/04/08 21:22:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (hdfs://master/970655147/input/01WordCount/ MapPartitionsRDD[1] at textFile at Test01WordCount.scala:30) 17/04/08 21:22:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 17/04/08 21:22:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:04 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:23:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:02 INFO AppClient$ClientEndpoint: Executor updated: app-20170408062155-0015/1 is now EXITED (Command exited with code 1) 17/04/08 21:24:02 INFO SparkDeploySchedulerBackend: Executor app-20170408062155-0015/1 removed: Command exited with code 1 17/04/08 21:24:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:24:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:06 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:21 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:36 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:25:51 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 17/04/08 21:26:02 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Command exited with code 1)] in 1 attempts org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.askTimeout at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:370) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144) at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:184) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 12 more -------------------------------------------------------------------------------------------- > ExecutorBackend blocked at "UserGroupInformation.doAs" > ------------------------------------------------------ > > Key: SPARK-20266 > URL: https://issues.apache.org/jira/browse/SPARK-20266 > Project: Spark > Issue Type: Question > Components: Project Infra > Affects Versions: 1.6.2 > Reporter: Jerry.X.He > Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org