[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server

Robert Kanter (JIRA) Tue, 17 Mar 2015 12:07:18 -0700

     [ 
https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Kanter updated OOZIE-2170:
---------------------------------
    Attachment: OOZIE-2170.patch

The patch adds a new {{SparkConfigurationService}} which will load the 
"spark-defaults.conf" files defined by the new 
{{oozie.service.SparkConfigurationService.spark.configurations}} oozie-site 
property.  It operates similarly to how we load Hadoop conf and default action 
confs

When the master starts with "yarn" and the {{<job-tracker>}} matches something 
in {{oozie.service.SparkConfigurationService.spark.configurations}}, Oozie will 
inject the properties from spark-defaults.conf as {{--conf}} parameters in the 
{{<spark-opts>}} field.  

In addition to the unit tests, I also tested out a variety of scenarios 
manually in a cluster (configs defined, invalid configs, {{*}}, duplicate 
properties, etc).


> Oozie should automatically set configs to make Spark jobs show up in the 
> Spark History Server
> ---------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-2170
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2170
>             Project: Oozie
>          Issue Type: Improvement
>          Components: action
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: OOZIE-2170.patch
>
>
> If you use "yarn-cluster" for the Spark action's master, the Spark jobs don't 
> show up in the Spark History Server or properly link to it from the Spark AM.
> The user needs to set this in their Spark action in the workflow.xml:
> {code:xml}
> <spark-opts>--conf spark.yarn.historyServer.address=http://SPH18088 --conf 
> spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf 
> spark.eventLog.enabled=true</spark-opts>
> {code}
> It would be nice if Oozie did this automatically via some oozie-site.xml 
> config(s).  We could do something similar how the hadoop configs are setup 
> where it will load a Spark .conf file from a directory based on the RM 
> specified in the <job-tracker>.
> While we're at it, it might be good to document how to use Spark on YARN:
> # Include the spark-assembly jar with your workflow (this is unfortunately 
> not published in maven)
> # Specify "yarn-cluster" as the master
> Also, the Spark example should delete the output dir in {{<prepare>}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server

Reply via email to