[
https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated OOZIE-2170:
---------------------------------
Attachment: OOZIE-2170.patch
The patch adds a new {{SparkConfigurationService}} which will load the
"spark-defaults.conf" files defined by the new
{{oozie.service.SparkConfigurationService.spark.configurations}} oozie-site
property. It operates similarly to how we load Hadoop conf and default action
confs
When the master starts with "yarn" and the {{<job-tracker>}} matches something
in {{oozie.service.SparkConfigurationService.spark.configurations}}, Oozie will
inject the properties from spark-defaults.conf as {{--conf}} parameters in the
{{<spark-opts>}} field.
In addition to the unit tests, I also tested out a variety of scenarios
manually in a cluster (configs defined, invalid configs, {{*}}, duplicate
properties, etc).
> Oozie should automatically set configs to make Spark jobs show up in the
> Spark History Server
> ---------------------------------------------------------------------------------------------
>
> Key: OOZIE-2170
> URL: https://issues.apache.org/jira/browse/OOZIE-2170
> Project: Oozie
> Issue Type: Improvement
> Components: action
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: OOZIE-2170.patch
>
>
> If you use "yarn-cluster" for the Spark action's master, the Spark jobs don't
> show up in the Spark History Server or properly link to it from the Spark AM.
> The user needs to set this in their Spark action in the workflow.xml:
> {code:xml}
> <spark-opts>--conf spark.yarn.historyServer.address=http://SPH18088 --conf
> spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf
> spark.eventLog.enabled=true</spark-opts>
> {code}
> It would be nice if Oozie did this automatically via some oozie-site.xml
> config(s). We could do something similar how the hadoop configs are setup
> where it will load a Spark .conf file from a directory based on the RM
> specified in the <job-tracker>.
> While we're at it, it might be good to document how to use Spark on YARN:
> # Include the spark-assembly jar with your workflow (this is unfortunately
> not published in maven)
> # Specify "yarn-cluster" as the master
> Also, the Spark example should delete the output dir in {{<prepare>}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)