Hello Spark Fans, I am trying to run a spark job via oozie as a java action. The spark code is packaged as a MySparkJob.jar compiled using sbt assembly (excluding spark and hadoop dependencies).
I am able to invoke the spark job from any client using java -cp lib/MySparkJob.jar:lib/spark-0.9-assembly-cdh4.jar Test where "Test" is the name of the main class in my jar. We have a spark cluster (say lab1) of 4 machines that are divided as 1 spark master and 3 workers. Oozie is running on another server ( say oozie). In oozie I created a /tmp/test_deploy_spark folder in the hdfs and this contains the following: the workflow.xml and a lib folder containing a) MySparkJob.jar b) spark-0.9-assembly-cdh4.jar (spark assembled with cdh4 The job launches successfully, but the mapper fails with following error at the "val sc = new SparkContext(sparkConf)" line [ERROR] [04/30/2014 22:25:15.440] [main] [Remoting] Remoting error: [Startup timed out] [ akka.remote.RemoteTransportException: Startup timed out at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129) at akka.remote.Remoting.start(Remoting.scala:191) at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184) at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579) at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577) at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588) at akka.actor.ActorSystem$.apply(ActorSystem.scala:111) at akka.actor.ActorSystem$.apply(ActorSystem.scala:104) at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126) at org.apache.spark.SparkContext.<init>(SparkContext.scala:139) I have the following questions a) Setting a jar path in Spark so that the job can be scheduled by oozie. When I use the following setting in my sparkConf: "sparkConf.setJars(List("MySpark.jar"))" where am I expected to load MySpark.jar? b) The above mentioned error seems to be arising from akka, not from "not" being able to find the jar If anyone has tried to run a spark job through oozie, please let me know if you have any ideas Thanks! Shivani -- Software Engineer Analytics Engineering Team@ Box Mountain View, CA