Hi All, I used ip addresses in my scripts (spark-env.sh) and slaves contain ip addresses of master and slave nodes respectively. However, I still have no luck. Here is the relevant log file snippet:
Master node log:14/07/08 10:56:19 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@172.16.48.41:7077] -> [akka.tcp://spark@localhost:35797]: Error [Association failed with [akka.tcp://spark@localhost:35797]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@localhost:35797]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: localhost/127.0.0.1:35797]14/07/08 10:56:19 INFO Master: akka.tcp://spark@localhost:35797 got disassociated, removing it.14/07/08 10:56:19 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@172.16.48.41:7077] -> [akka.tcp://spark@localhost:35797]: Error [Association failed with [akka.tcp://spark@localhost:35797]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://spark@localhost:35797]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: localhost/127.0.0.1:35797] Worker node log: 14/07/08 10:56:11 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.9.x86_64/jre/bin/java" "-cp" "::/apps/software/spark-1.0.0-bin-hadoop1/conf:/apps/software/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/apps/hadoop/hadoop-conf" "-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://spark@localhost:35797/user/CoarseGrainedScheduler" "6" "pzxnvm2023.x.y.name.org" "4" "akka.tcp://sparkwor...@pzxnvm2023.x.y.name.or:34222/user/Worker" "app-20140708105602-0000" 14/07/08 10:56:11 ERROR EndpointWriter: AssociationError [akka.tcp://sparkwor...@pzxnvm2023.x.y.name.org:34222] -> [akka.tcp://sparkexecu...@pzxnvm2023.x.y.name.org:52485]: Error [Association failed with [akka.tcp://sparkexecu...@pzxnvm2023.x.y.name.org:52485]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkexecu...@pzxnvm2023.dcld.pldc.kp.org:52485]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: pzxnvm2023.x.y.name.org/172.16.48.51:52485]14/07/08 10:56:13 INFO Worker: Executor app-20140708105602-0000/6 finished with state FAILED message Command exited with code 1 exitStatus 114/07/08 10:56:13 INFO Worker: Asked to launch executor app-20140708105602-0000/8 for ApproxStrMatch