Hi, Spark Users:
I tried to test the spark in a standalone box, but faced an issue which I don't
know what is the root cause. I basically followed exactly document of deploy
spark in a standalone environment.
1) I check out spark source code of release 1.1.02) I build the spark with
following command: ./make-distribution.sh -Pyarn -Phadoop-2.4
-Dhadoop.version=2.4.0 -DskipTests, Succeeded.3) I make sure that I can ssh to
the localhost as myself using ssh key.4) I run the sbin/start-all.sh, it looks
fine, at least I saw 2 java processes running.5) When I run the following
command: yzhang@yzhang-linux:/opt/spark-1.1.0-bin-hadoop2.4.0/bin$
./spark-shell --master spark://yzhang-linux:7077
I saw the following message, then the shell exits itself.
14/10/27 11:22:53 INFO repl.SparkILoop: Created spark context..Spark context
available as sc.
scala> 14/10/27 11:23:13 INFO client.AppClient$ClientActor: Connecting to
master spark://yzhang-linux:7077...14/10/27 11:23:33 INFO
client.AppClient$ClientActor: Connecting to master
spark://yzhang-linux:7077...14/10/27 11:23:53 ERROR
cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All
masters are unresponsive! Giving up.14/10/27 11:23:53 ERROR
scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All
masters are unresponsive! Giving up.
Now, I check the log file, and found out the following message in the master
log:
14/10/27 11:22:53 ERROR remote.EndpointWriter: dropping message [class
akka.actor.SelectChildName] for non-local recipient
[Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at
[akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:13 ERROR
remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for
non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving
at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:33 ERROR
remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for
non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving
at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:53 INFO master.Master:
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing
it.14/10/27 11:23:53 INFO master.Master:
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing
it.14/10/27 11:23:53 INFO actor.LocalActorRef: Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from
Actor[akka://sparkMaster/deadLetters] to
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.240.8%3A63348-2#1992401281]
was not delivered. [1] dead letters encountered. This logging can be turned
off or adjusted with configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.14/10/27 11:23:53 ERROR
remote.EndpointWriter: AssociationError
[akka.tcp://sparkMaster@yzhang-linux:7077] ->
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with
[akka.tcp://sparkDriver@yzhang-linux:44017]]
[akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO
master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated,
removing it.14/10/27 11:23:53 INFO master.Master:
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing
it.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkMaster@yzhang-linux:7077] ->
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with
[akka.tcp://sparkDriver@yzhang-linux:44017]]
[akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 ERROR
remote.EndpointWriter: AssociationError
[akka.tcp://sparkMaster@yzhang-linux:7077] ->
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with
[akka.tcp://sparkDriver@yzhang-linux:44017]]
[akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO
master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated,
removing it.
Any reason why this is happening? The web UI of spark looks normal. There is no
error message in the worker log. This is a standalone box, no firewall. The
hostname and IP can be resolved by itself without any problem.
Thanks for your help.
Yong