[ https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Kanter updated OOZIE-2170: --------------------------------- Attachment: OOZIE-2170.patch Attaching the final version of the patch for reference. > Oozie should automatically set configs to make Spark jobs show up in the > Spark History Server > --------------------------------------------------------------------------------------------- > > Key: OOZIE-2170 > URL: https://issues.apache.org/jira/browse/OOZIE-2170 > Project: Oozie > Issue Type: Improvement > Components: action > Affects Versions: trunk > Reporter: Robert Kanter > Assignee: Robert Kanter > Fix For: trunk > > Attachments: OOZIE-2170.patch, OOZIE-2170.patch, OOZIE-2170.patch > > > If you use "yarn-cluster" for the Spark action's master, the Spark jobs don't > show up in the Spark History Server or properly link to it from the Spark AM. > The user needs to set this in their Spark action in the workflow.xml: > {code:xml} > <spark-opts>--conf spark.yarn.historyServer.address=http://SPH18088 --conf > spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf > spark.eventLog.enabled=true</spark-opts> > {code} > It would be nice if Oozie did this automatically via some oozie-site.xml > config(s). We could do something similar how the hadoop configs are setup > where it will load a Spark .conf file from a directory based on the RM > specified in the <job-tracker>. > While we're at it, it might be good to document how to use Spark on YARN: > # Include the spark-assembly jar with your workflow (this is unfortunately > not published in maven) > # Specify "yarn-cluster" as the master > Also, the Spark example should delete the output dir in {{<prepare>}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)