[adding dev list since it's probably a bug, but i'm not sure how to
reproduce so I can open a bug about it]

Hi,

I have a standalone Spark 1.4.0 cluster with 100s of applications running
every day.

>From time to time, the applications crash with the following error (see
below)
But at the same time (and also after that), other applications are running,
so I can safely assume the master and workers are working.

1. why is there a NullPointerException? (i can't track the scala stack
trace to the code, but anyway NPE is usually a obvious bug even if there's
actually a network error...)
2. why can't it connect to the master? (if it's a network timeout, how to
increase it? i see the values are hardcoded inside AppClient)
3. how to recover from this error?


  ERROR 01-11 15:32:54,991    SparkDeploySchedulerBackend - Application has
been killed. Reason: All masters are unresponsive! Giving up. ERROR
  ERROR 01-11 15:32:55,087              OneForOneStrategy - ERROR
logs/error.log
  java.lang.NullPointerException NullPointerException
      at
org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
      at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
      at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
      at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
      at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
      at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
      at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
      at
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
      at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
      at
org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
      at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
      at akka.actor.ActorCell.invoke(ActorCell.scala:487)
      at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
      at akka.dispatch.Mailbox.run(Mailbox.scala:220)
      at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
      at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
  ERROR 01-11 15:32:55,603                   SparkContext - Error
initializing SparkContext. ERROR
  java.lang.IllegalStateException: Cannot call methods on a stopped
SparkContext
      at org.apache.spark.SparkContext.org
$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
      at
org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1501)
      at
org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2005)
      at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
      at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)


Thanks!

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

Reply via email to