Hello Spark Fans,

I am trying to run a spark job via oozie as a java action. The spark code
is packaged as a MySparkJob.jar compiled using sbt assembly (excluding
spark and hadoop dependencies).

I am able to invoke the spark job from any client using

java -cp lib/MySparkJob.jar:lib/spark-0.9-assembly-cdh4.jar Test

where "Test" is the name of the main class in my jar.

We have a spark cluster (say lab1) of 4 machines that are divided as 1
spark master and 3 workers. Oozie is running on another server ( say
oozie). In oozie I created a /tmp/test_deploy_spark folder in the hdfs and
this contains the following:

the workflow.xml
and a lib folder containing
     a) MySparkJob.jar
     b) spark-0.9-assembly-cdh4.jar (spark assembled with cdh4

The job launches successfully, but the mapper fails with following error at
the "val sc = new SparkContext(sparkConf)" line

[ERROR] [04/30/2014 22:25:15.440] [main] [Remoting] Remoting error:
[Startup timed out] [

akka.remote.RemoteTransportException: Startup timed out
        at 
akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
        at akka.remote.Remoting.start(Remoting.scala:191)
        at 
akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
        at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
        at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
        at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
        at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
        at 
org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:96)
        at org.apache.spark.SparkEnv$.create(SparkEnv.scala:126)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:139)

I have the following questions

a) Setting a jar path in Spark so that the job can be scheduled by oozie.
When I use the following setting in my sparkConf:
"sparkConf.setJars(List("MySpark.jar"))" where am I expected to load
MySpark.jar?

b) The above mentioned error seems to be arising from akka, not from "not"
being able to find the jar

If anyone has tried to run a spark job through oozie, please let me know if
you have any ideas

Thanks!
Shivani

-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Reply via email to