I have mucked around this a little bit. The first step to make this happen is to build a fat jar. I wrote a quick blog<http://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.html>documenting my learning curve w.r.t that.
The next step is to schedule this as a java action. Since your code will need to reference the spark as well as the hadoop libraries, it is best to supply those in your java action. In order to do this you will need to supply these jars in the "lib" folder. So if <my-test-folder>/workflow.xml contains your java action, then <my-test-folder>/lib would contain the following jars a) your spark lib jar b) your spark app jar However, this is where i got stuck. I got some time-out errors thrown by akka when attempting to create a spark context. This could be due to the following two reasons a) The "setJars" function that needs be called before a spark context is created is probably not finding the right jar. I am a little clueless on how to do this. As mentioned in the spark documentation<http://spark.apache.org/docs/0.9.1/spark-standalone.html> we need to specify the jar explicitly. However, given that oozie copies everything into a tmp folder, I am not sure how to specify this path, so that the data node that is executing the "java -cp <path-to-fat-jar>:<path-to-libs> <mainclassname> " would know where to find the containing jar. b) My oozie is running on a different machine and attempting to launch the spark job on a different cluster. Maybe that's what the time-out error means. I still don't know. So in summary, the limitation is that a) Need to find a way to specify the path to the jar in "setJar" function b) Need to have oozie running on the same cluster as oozie I will keep you updated Shivani On Thu, Apr 10, 2014 at 8:52 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote: > I dont think it'll do failure detection etc of spark job in Oozie as of > yet. You should be able to trigger it from Oozie (worst case as a shell > script). > > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Thu, Apr 10, 2014 at 2:56 AM, Konstantin Kudryavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > >> I believe you need to write custom action or engage java action >> On Apr 10, 2014 12:11 AM, "Segerlind, Nathan L" < >> nathan.l.segerl...@intel.com> wrote: >> >>> Howdy. >>> >>> >>> >>> Is it possible to initiate Spark jobs from Oozie (presumably as a java >>> action)? If so, are there known limitations to this? And would anybody >>> have a pointer to an example? >>> >>> >>> >>> Thanks, >>> >>> Nate >>> >>> >>> >> > -- Software Engineer Analytics Engineering Team@ Box Mountain View, CA