Hi,

I have set up Spark 0.9.2 standalone cluster using CDH5 and pre-built spark distribution archive for Hadoop 2. I was not using spark-ec2 scripts because I am not on EC2 cloud.

Spark-shell seems to be working properly -- I am able to perform simple RDD operations, as well as e.g. SparkPi standalone example works well when run via `run-example`. Web UI shows all workers connected.

However, standalone Scala application gets "connection refused" messages. I think this has something to do with configuration, because spark-shell and SparkPi works well. I verified that .setMaster and .setSparkHome are properly assigned within scala app.

Is there anything else in configuration of standalone scala app on spark that I am missing?
I would very much appreciate any clues.

Namely, I am trying to run MovieLensALS.scala example from AMPCamp big data mini course (http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html).

Here is error which I get when try to run compiled jar:
---------------
root@master:~/machine-learning/scala# sbt/sbt package "run /movielens/medium"
Launching sbt from sbt/sbt-launch-0.12.4.jar
[info] Loading project definition from /root/training/machine-learning/scala/project [info] Set current project to movielens-als (in build file:/root/training/machine-learning/scala/) [info] Compiling 1 Scala source to /root/training/machine-learning/scala/target/scala-2.10/classes... [warn] there were 2 deprecation warning(s); re-run with -deprecation for details
[warn] one warning found
[info] Packaging /root/training/machine-learning/scala/target/scala-2.10/movielens-als_2.10-0.0.jar ...
[info] Done packaging.
[success] Total time: 6 s, completed Oct 2, 2014 1:19:00 PM
[info] Running MovieLensALS /movielens/medium
master = spark://master:7077
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 14/10/02 13:19:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HERE
THERE
14/10/02 13:19:02 INFO FileInputFormat: Total input paths to process : 1
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 0 on host2: remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 4 on host5: remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 1 on host4: remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 4 (task 0.0:1)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 3 on host3: remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
14/10/02 13:19:03 ERROR TaskSchedulerImpl: Lost executor 2 on host1: remote Akka client disassociated
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
14/10/02 13:19:03 WARN TaskSetManager: Lost TID 7 (task 0.0:0)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 6 on host4: remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 8 (task 0.0:0)
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 5 on host2: remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 10 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 7 on host5: remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 11 (task 0.0:0)
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 12 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 8 on host3: remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 13 (task 0.0:1)
14/10/02 13:19:04 ERROR TaskSchedulerImpl: Lost executor 9 on host1: remote Akka client disassociated
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 14 (task 0.0:0)
14/10/02 13:19:04 WARN TaskSetManager: Lost TID 15 (task 0.0:1)
14/10/02 13:19:05 ERROR AppClient$ClientActor: Master removed our application: FAILED; stopping client 14/10/02 13:19:05 WARN SparkDeploySchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection... 14/10/02 13:19:06 ERROR TaskSchedulerImpl: Lost executor 11 on host5: remote Akka client disassociated
14/10/02 13:19:06 WARN TaskSetManager: Lost TID 17 (task 0.0:0)
14/10/02 13:19:06 WARN TaskSetManager: Lost TID 16 (task 0.0:1)
---------------

And this is error log on one of the workers:
---------------
14/10/02 13:19:05 INFO worker.Worker: Executor app-20141002131901-0002/9 finished with state FAILED message Command exited with code 1 exitStatus 1 14/10/02 13:19:05 INFO actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40xxx.xx.xx.xx%3A57719-15#1504298502] was not delivered. [6] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker@host1:47421] -> [akka.tcp://sparkExecutor@host1:45542]: Error [Association failed with [akka.tcp://sparkExecutor@host1:45542]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host1:45542] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: host1/xxx.xx.xx.xx:45542
]
14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker@host1:47421] -> [akka.tcp://sparkExecutor@host1:45542]: Error [Association failed with [akka.tcp://sparkExecutor@host1:45542]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host1:45542] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: host1/xxx.xx.xx.xx:45542
]
14/10/02 13:19:05 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker@host1:47421] -> [akka.tcp://sparkExecutor@host1:45542]: Error [Association failed with [akka.tcp://sparkExecutor@host1:45542]] [ akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@host1:45542] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: host1/xxx.xx.xx.xx:45542
---------------

Thanks!
Irina

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to