This looks like Spark application is running into a abnormal status. From
the stack it means driver could not send requests to AM, can you please
check if AM is reachable or are there any other exceptions beside this one.

>From my past test, Spark's dynamic allocation may run into some corner
cases when NM is gone or restarted, it would be better to check all the
logs (driver, AM and executors) to find out some clues why run into this.
It is hard to tell a exact reason simply based on the exception you pasted
above.

On Wed, Aug 24, 2016 at 3:16 PM, Shane Lee <shane_y_...@yahoo.com.invalid>
wrote:

> Hello all,
>
> I am running hadoop 2.6.4 with Spark 2.0 and I have been trying to get
> dynamic allocation to work without success. I was able to get it to work
> with Spark 16.1 however.
>
> When I issue the command
> spark-shell --master yarn --deploy-mode client
>
> this is the error I see:
>
> 16/08/24 00:05:40 WARN NettyRpcEndpointRef: Error sending message [message
> = RequestExecutors(1,0,Map())] in 1 attempts
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120
> seconds]. This timeout is controlled by spark.rpc.askTimeout
>         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>         at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.
> applyOrElse(RpcTimeout.scala:63)
>         at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.
> applyOrElse(RpcTimeout.scala:59)
>         at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> RpcTimeout.scala:83)
>         at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(
> RpcEndpointRef.scala:102)
>         at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(
> RpcEndpointRef.scala:78)
>         at
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.
> doRequestTotalExecutors(YarnSchedulerBackend.scala:128)
>         at org.apache.spark.scheduler.cluster.
> CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGr
> ainedSchedulerBackend.scala:493)
>         at org.apache.spark.SparkContext.requestTotalExecutors(
> SparkContext.scala:1482)
>         at org.apache.spark.ExecutorAllocationManager.start(
> ExecutorAllocationManager.scala:235)
>         at org.apache.spark.SparkContext$$anonfun$21.apply(
> SparkContext.scala:534)
>         at org.apache.spark.SparkContext$$anonfun$21.apply(
> SparkContext.scala:534)
>         at scala.Option.foreach(Option.scala:257)
>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:534)
>         at org.apache.spark.SparkContext$.getOrCreate(SparkContext.
> scala:2256)
>         at org.apache.spark.sql.SparkSession$Builder$$anonfun$
> 8.apply(SparkSession.scala:831)
>         at org.apache.spark.sql.SparkSession$Builder$$anonfun$
> 8.apply(SparkSession.scala:823)
>         at scala.Option.getOrElse(Option.scala:121)
>         at org.apache.spark.sql.SparkSession$Builder.
> getOrCreate(SparkSession.scala:823)
>         at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
>         at $line3.$read$$iw$$iw.<init>(<console>:15)
>         at $line3.$read$$iw.<init>(<console>:31)
>         at $line3.$read.<init>(<console>:33)
>         at $line3.$read$.<init>(<console>:37)
>         at $line3.$read$.<clinit>(<console>)
>         at $line3.$eval$.$print$lzycompute(<console>:7)
>         at $line3.$eval$.$print(<console>:6)
>         at $line3.$eval.$print(<console>)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(
> IMain.scala:786)
>         at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(
> IMain.scala:1047)
>         at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$
> loadAndRunReq$1.apply(IMain.sca
> la:638)
>         at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$
> loadAndRunReq$1.apply(IMain.sca
> la:637)
>         at scala.reflect.internal.util.ScalaClassLoader$class.
> asContext(ScalaClassLoader.scala:31)
>         at scala.reflect.internal.util.AbstractFileClassLoader.asContext(
> AbstractFileClassLoader.sca
> la:19)
>         at scala.tools.nsc.interpreter.IMain$WrappedRequest.
> loadAndRunReq(IMain.scala:637)
>         at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
>         at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
>         at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(
> ILoop.scala:807)
>         at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
>         at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
>         at org.apache.spark.repl.SparkILoop$$anonfun$
> initializeSpark$1.apply$mcV$sp(SparkILoop.scala
> :38)
>         at org.apache.spark.repl.SparkILoop$$anonfun$
> initializeSpark$1.apply(SparkILoop.scala:37)
>         at org.apache.spark.repl.SparkILoop$$anonfun$
> initializeSpark$1.apply(SparkILoop.scala:37)
>         at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.
> scala:214)
>         at org.apache.spark.repl.SparkILoop.initializeSpark(
> SparkILoop.scala:37)
>         at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
>         at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.
> apply$mcZ$sp(ILoop.scala:920)
>         at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.
> apply(ILoop.scala:909)
>         at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.
> apply(ILoop.scala:909)
>         at scala.reflect.internal.util.ScalaClassLoader$.
> savingContextLoader(ScalaClassLoader.scala:
> 97)
>         at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
>         at org.apache.spark.repl.Main$.doMain(Main.scala:68)
>         at org.apache.spark.repl.Main$.main(Main.scala:51)
>         at org.apache.spark.repl.Main.main(Main.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSu
> bmit.scala:729)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:185)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:210)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:124)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [120 seconds]
>         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
> scala:219)
>         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.
> scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(
> package.scala:190)
>         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
> BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:190)
>         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> RpcTimeout.scala:81)
>         ... 63 more
> 16/08/24 00:05:40 WARN NettyRpcEndpointRef: Error sending message [message
> = RequestExecutors(1,0,Map())] in 1 attempts 
> org.apache.spark.rpc.RpcTimeoutException:
> Futures timed out after [120 seconds]. This timeout is controlled by
> spark.rpc.askTimeout
>         at org.apache.spark.rpc.RpcTimeout.org$apache$spark$
> rpc$RpcTimeout$$createRpcTimeoutExceptio
> n(RpcTimeout.scala:48)
>         at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.
> applyOrElse(RpcTimeout.sca
> la:63)
>         at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.
> applyOrElse(RpcTimeout.sca
> la:59)
>         at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> RpcTimeout.scala:83)
>         at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(
> RpcEndpointRef.scala:102)
>         at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(
> RpcEndpointRef.scala:78)
>         at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$
> YarnSchedulerEndpoint$$anonfun$re
> ceiveAndReply$1$$anonfun$applyOrElse$1.apply$mcV$sp(
> YarnSchedulerBackend.scala:271)
>         at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$
> YarnSchedulerEndpoint$$anonfun$re
> ceiveAndReply$1$$anonfun$applyOrElse$1.apply(YarnSchedulerBackend.scala:
> 271)
>         at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$
> YarnSchedulerEndpoint$$anonfun$re
> ceiveAndReply$1$$anonfun$applyOrElse$1.apply(YarnSchedulerBackend.scala:
> 271)
>         at scala.concurrent.impl.Future$PromiseCompletingRunnable.
> liftedTree1$1(Future.scala:24)
>         at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(
> Future.scala:24)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [120 seconds]
>         at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.
> scala:219)
>         at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.
> scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(
> package.scala:190)
>         at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(
> BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:190)
>         at org.apache.spark.rpc.RpcTimeout.awaitResult(
> RpcTimeout.scala:81)
>         ... 10 more
> 16/08/24 00:05:49 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:
> Container marked as failed: container_1472021641742_0002_01_000002 on
> host: RAFDV-HOST01. Exit status: 1. Diagnostics: Exception from
> container-launch.
> Container id: container_1472021641742_0002_01_000002
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(
> Shell.java:715)
>         at org.apache.hadoop.yarn.server.nodemanager.
> DefaultContainerExecutor.launchContainer(Defaul
> tContainerExecutor.java:212)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.
> launcher.ContainerLaunch.call(
> ContainerLaunch.java:302)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.
> launcher.ContainerLaunch.call(
> ContainerLaunch.java:82)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> Shell output:         1 file(s) moved.
>
>
> Container exited with a non-zero exit code 1
>
> If I turn off dynamic allocation, the command works just fine.
>
> Are there additional steps beyond what's described in the spark
> documentation I need to take?
>
> Your help is much appreciated,
>
> Shane
>
>

Reply via email to