Hi everyone,
Our process is to build a fat-jar with our job and dependencies, upload that jar to HDFS (accessible by Oozie and, in our case, YARN), and then reference that HDFS location in our spark-action in the <jar> block.
Here's our spark-action:
<spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>yarn-cluster</master> <name>SparkJob</name> <class>com.dealer.rtb.spark.SparkJob</class> <jar>*** YOUR HFDS LOCATION HERE ***</jar> <spark-opts>--conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=3 --conf spark.executor.memory=8g --conf spark.shuffle.service.enabled=true --conf spark.eventLog.overwrite=true --conf spark.dynamicAllocation.maxExecutors=6 --conf spark.dynamicAllocation.executorIdleTimeout=900 --conf spark.metrics.conf=/etc/spark.metrics.properties</spark-opts> </spark> I should also note that we have built a custom maven plugin (not open-source, sorry) for publishing our job, coordinator and workflow, and its jar to the cluster. In our case, it publishes the jar to the following location:
hdfs://nameservice1/user/<USERNAME>/workflows/<JAR VERSION>/<JOB NAME>/<WORKFLOW NAME>/lib/
My understanding is that the contents of the lib directory are, by default, added to the classpath of the job.
Hope that helps.
From: Chandeep Singh [c...@chandeep.com]
Sent: Monday, March 07, 2016 11:02 AM To: Neelesh Salian Cc: Benjamin Kim; Deepak Sharma; Divya Gehlot; user @spark; u...@hadoop.apache.org Subject: Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter As a work around you could put your spark-submit statement in a shell script and then use Oozie’s SSH action to execute that script.
|
- Steps to Run Spark Scala job from Oozie on EC2 Hadoop cls... Divya Gehlot
- Re: Steps to Run Spark Scala job from Oozie on EC2 H... Deepak Sharma
- Re: Steps to Run Spark Scala job from Oozie on E... Benjamin Kim
- Re: Steps to Run Spark Scala job from Oozie ... Neelesh Salian
- Re: Steps to Run Spark Scala job from Oo... Chandeep Singh
- RE: Steps to Run Spark Scala job fr... Joshua Dickerson