Hi, Spark Users:
I tried to test the spark in a standalone box, but faced an issue which I don't 
know what is the root cause. I basically followed exactly document of deploy 
spark in a standalone environment.
1) I check out spark source code of release 1.1.02) I build the spark with 
following command: ./make-distribution.sh -Pyarn -Phadoop-2.4 
-Dhadoop.version=2.4.0 -DskipTests, Succeeded.3) I make sure that I can ssh to 
the localhost as myself using ssh key.4) I run the sbin/start-all.sh, it looks 
fine, at least I saw 2 java processes running.5) When I run the following 
command: yzhang@yzhang-linux:/opt/spark-1.1.0-bin-hadoop2.4.0/bin$ 
./spark-shell --master spark://yzhang-linux:7077
I saw the following message, then the shell exits itself.
14/10/27 11:22:53 INFO repl.SparkILoop: Created spark context..Spark context 
available as sc.
scala> 14/10/27 11:23:13 INFO client.AppClient$ClientActor: Connecting to 
master spark://yzhang-linux:7077...14/10/27 11:23:33 INFO 
client.AppClient$ClientActor: Connecting to master 
spark://yzhang-linux:7077...14/10/27 11:23:53 ERROR 
cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All 
masters are unresponsive! Giving up.14/10/27 11:23:53 ERROR 
scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All 
masters are unresponsive! Giving up.
Now, I check the log file, and found out the following message in the master 
log:
 14/10/27 11:22:53 ERROR remote.EndpointWriter: dropping message [class 
akka.actor.SelectChildName] for non-local recipient 
[Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at 
[akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are 
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:13 ERROR 
remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for 
non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving 
at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are 
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:33 ERROR 
remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for 
non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving 
at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are 
[akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:53 INFO master.Master: 
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing 
it.14/10/27 11:23:53 INFO master.Master: 
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing 
it.14/10/27 11:23:53 INFO actor.LocalActorRef: Message 
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from 
Actor[akka://sparkMaster/deadLetters] to 
Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.240.8%3A63348-2#1992401281]
 was not delivered. [1] dead letters encountered. This logging can be turned 
off or adjusted with configuration settings 'akka.log-dead-letters' and 
'akka.log-dead-letters-during-shutdown'.14/10/27 11:23:53 ERROR 
remote.EndpointWriter: AssociationError 
[akka.tcp://sparkMaster@yzhang-linux:7077] -> 
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]] 
[akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO 
master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, 
removing it.14/10/27 11:23:53 INFO master.Master: 
akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing 
it.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError 
[akka.tcp://sparkMaster@yzhang-linux:7077] -> 
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]] 
[akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 ERROR 
remote.EndpointWriter: AssociationError 
[akka.tcp://sparkMaster@yzhang-linux:7077] -> 
[akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]] 
[akka.remote.EndpointAssociationException: Association failed with 
[akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: 
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: 
Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO 
master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, 
removing it.
Any reason why this is happening? The web UI of spark looks normal. There is no 
error message in the worker log. This is a standalone box, no firewall. The 
hostname and IP can be resolved by itself without any problem.
Thanks for your help.
Yong                                      

Reply via email to