RE: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

Joshua Dickerson Wed, 23 Mar 2016 05:58:10 -0700

Hi everyone,

We have been successfully deploying Spark jobs through Oozie using the spark-action for over a year, however, we are deploying to our on-premises Hadoop infrastructure, not EC2.

Our process is to build a fat-jar with our job and dependencies, upload that jar to HDFS (accessible by Oozie and, in our case, YARN), and then reference that HDFS location in our spark-action in the <jar> block.

Here's our spark-action:

<job-tracker>${jobTracker}</job-tracker>

<name-node>${nameNode}</name-node>

<master>yarn-cluster</master>

<name>SparkJob</name>

<class>com.dealer.rtb.spark.SparkJob</class>

<jar>*** YOUR HFDS LOCATION HERE ***</jar>

<spark-opts>--conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=3 --conf spark.executor.memory=8g --conf spark.shuffle.service.enabled=true --conf spark.eventLog.overwrite=true --conf spark.dynamicAllocation.maxExecutors=6 --conf spark.dynamicAllocation.executorIdleTimeout=900 --conf spark.metrics.conf=/etc/spark.metrics.properties</spark-opts>

</spark>

I should also note that we have built a custom maven plugin (not open-source, sorry) for publishing our job, coordinator and workflow, and its jar to the cluster. In our case, it publishes the jar to the following location:

hdfs://nameservice1/user/<USERNAME>/workflows/<JAR VERSION>/<JOB NAME>/<WORKFLOW NAME>/lib/

My understanding is that the contents of the lib directory are, by default, added to the classpath of the job.

Hope that helps.

Joshua Dickerson
Senior Developer
Advertising - Real-Time Bidding

p: 888.894.8989

A Cox Automotive Brand

From: Chandeep Singh [c...@chandeep.com]
Sent: Monday, March 07, 2016 11:02 AM
To: Neelesh Salian
Cc: Benjamin Kim; Deepak Sharma; Divya Gehlot; user @spark; u...@hadoop.apache.org
Subject: Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

As a work around you could put your spark-submit statement in a shell script and then use Oozie’s SSH action to execute that script.

On Mar 7, 2016, at 3:58 PM, Neelesh Salian <nsal...@cloudera.com> wrote:

Hi Divya,

This link should have the details that you need to begin using the Spark Action on Oozie:

https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html

Thanks.

On Mon, Mar 7, 2016 at 7:52 AM, Benjamin Kim <bbuil...@gmail.com> wrote:

To comment…

At my company, we have not gotten it to work in any other mode than local. If we try any of the yarn modes, it fails with a “file does not exist” error when trying to locate the executable jar. I mentioned this to the Hue users group, which we used for this, and they replied that the Spark Action is very basic implementation and that they will be writing their own for production use.

That’s all I know...

On Mar 7, 2016, at 1:18 AM, Deepak Sharma <deepakmc...@gmail.com> wrote:

There is Spark action defined for oozie workflows.

Though I am not sure if it supports only Java SPARK jobs or Scala jobs as well.

https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html

Thanks

Deepak

On Mon, Mar 7, 2016 at 2:44 PM, Divya Gehlot <divya.htco...@gmail.com> wrote:

Hi,

Could somebody help me by providing the steps /redirect me to blog/documentation on how to run Spark job written in scala through Oozie.

Would really appreciate the help.

Thanks,

Divya

--

Thanks
Deepak
www.bigdatabig.com
www.keosha.net

--

Neelesh Srinivas Salian

Customer Operations Engineer

--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

RE: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

Reply via email to