Hi,

I can't seem to understand why all created executors always fail.

I have a Spark standalone cluster setup make up of 2 workers and 1 master.
My spark-env looks like this:

SPARK_MASTER_IP=192.168.2.11
SPARK_LOCAL_IP=192.168.2.11
SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=4"
SPARK_WORKER_CORES=4
SPARK_WORKER_MEMORY=6g

>From the Spark logs, I get this:

15/11/04 20:36:35 WARN remote.ReliableDeliverySupervisor: Association
with remote system [akka.tcp://sparkDriver@172.26.71.5:61094] has
failed, address is now gated for [5000] ms. Reason: [Association
failed with [akka.tcp://sparkDriver@172.26.71.5:61094]] Caused by:
[Operation timed out: /172.26.71.5:61094]
Exception in thread "main" akka.actor.ActorNotFound: Actor not found
for: ActorSelection[Anchor(akka.tcp://sparkDriver@172.26.71.5:61094/),
Path(/user/CoarseGrainedScheduler)]
        at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
        at 
akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
        at 
akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:73)
        at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
        at 
akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:120)
        at 
akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
        at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
        at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
        at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:266)
        at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:533)
        at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:569)
        at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:559)
        at 
akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
        at akka.remote.EndpointWriter.postStop(Endpoint.scala:557)
        at akka.actor.Actor$class.aroundPostStop(Actor.scala:477)
        at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:411)
        at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
        at 
akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
        at akka.actor.ActorCell.terminate(ActorCell.scala:369)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/11/04 20:36:35 INFO actor.LocalActorRef: Message
[akka.remote.EndpointWriter$AckIdleCheckTimer$] from
Actor[akka://driverPropsFetcher/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkDriver%40172.26.71.5%3A61094-0/endpointWriter#-1769599826]
to 
Actor[akka://driverPropsFetcher/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkDriver%40172.26.71.5%3A61094-0/endpointWriter#-1769599826]
was not delivered. [1] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

I appreciate any kind of help.

Reply via email to