[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server
[ https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shwetha G S updated OOZIE-2170: --- Fix Version/s: (was: trunk) 4.2 Oozie should automatically set configs to make Spark jobs show up in the Spark History Server - Key: OOZIE-2170 URL: https://issues.apache.org/jira/browse/OOZIE-2170 Project: Oozie Issue Type: Improvement Components: action Affects Versions: trunk Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 4.2 Attachments: OOZIE-2170.patch, OOZIE-2170.patch, OOZIE-2170.patch If you use yarn-cluster for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM. The user needs to set this in their Spark action in the workflow.xml: {code:xml} spark-opts--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true/spark-opts {code} It would be nice if Oozie did this automatically via some oozie-site.xml config(s). We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the job-tracker. While we're at it, it might be good to document how to use Spark on YARN: # Include the spark-assembly jar with your workflow (this is unfortunately not published in maven) # Specify yarn-cluster as the master Also, the Spark example should delete the output dir in {{prepare}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server
[ https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2170: - Attachment: OOZIE-2170.patch Attaching the final version of the patch for reference. Oozie should automatically set configs to make Spark jobs show up in the Spark History Server - Key: OOZIE-2170 URL: https://issues.apache.org/jira/browse/OOZIE-2170 Project: Oozie Issue Type: Improvement Components: action Affects Versions: trunk Reporter: Robert Kanter Assignee: Robert Kanter Fix For: trunk Attachments: OOZIE-2170.patch, OOZIE-2170.patch, OOZIE-2170.patch If you use yarn-cluster for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM. The user needs to set this in their Spark action in the workflow.xml: {code:xml} spark-opts--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true/spark-opts {code} It would be nice if Oozie did this automatically via some oozie-site.xml config(s). We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the job-tracker. While we're at it, it might be good to document how to use Spark on YARN: # Include the spark-assembly jar with your workflow (this is unfortunately not published in maven) # Specify yarn-cluster as the master Also, the Spark example should delete the output dir in {{prepare}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server
[ https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2170: - Attachment: OOZIE-2170.patch Uploading a rebased patch in case that's the problem. Though it seems odd that HEAD didn't compile... Oozie should automatically set configs to make Spark jobs show up in the Spark History Server - Key: OOZIE-2170 URL: https://issues.apache.org/jira/browse/OOZIE-2170 Project: Oozie Issue Type: Improvement Components: action Affects Versions: trunk Reporter: Robert Kanter Assignee: Robert Kanter Attachments: OOZIE-2170.patch, OOZIE-2170.patch If you use yarn-cluster for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM. The user needs to set this in their Spark action in the workflow.xml: {code:xml} spark-opts--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true/spark-opts {code} It would be nice if Oozie did this automatically via some oozie-site.xml config(s). We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the job-tracker. While we're at it, it might be good to document how to use Spark on YARN: # Include the spark-assembly jar with your workflow (this is unfortunately not published in maven) # Specify yarn-cluster as the master Also, the Spark example should delete the output dir in {{prepare}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server
[ https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2170: - Attachment: OOZIE-2170.patch The patch adds a new {{SparkConfigurationService}} which will load the spark-defaults.conf files defined by the new {{oozie.service.SparkConfigurationService.spark.configurations}} oozie-site property. It operates similarly to how we load Hadoop conf and default action confs When the master starts with yarn and the {{job-tracker}} matches something in {{oozie.service.SparkConfigurationService.spark.configurations}}, Oozie will inject the properties from spark-defaults.conf as {{--conf}} parameters in the {{spark-opts}} field. In addition to the unit tests, I also tested out a variety of scenarios manually in a cluster (configs defined, invalid configs, {{*}}, duplicate properties, etc). Oozie should automatically set configs to make Spark jobs show up in the Spark History Server - Key: OOZIE-2170 URL: https://issues.apache.org/jira/browse/OOZIE-2170 Project: Oozie Issue Type: Improvement Components: action Affects Versions: trunk Reporter: Robert Kanter Assignee: Robert Kanter Attachments: OOZIE-2170.patch If you use yarn-cluster for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM. The user needs to set this in their Spark action in the workflow.xml: {code:xml} spark-opts--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true/spark-opts {code} It would be nice if Oozie did this automatically via some oozie-site.xml config(s). We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the job-tracker. While we're at it, it might be good to document how to use Spark on YARN: # Include the spark-assembly jar with your workflow (this is unfortunately not published in maven) # Specify yarn-cluster as the master Also, the Spark example should delete the output dir in {{prepare}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (OOZIE-2170) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server
[ https://issues.apache.org/jira/browse/OOZIE-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2170: - Summary: Oozie should automatically set configs to make Spark jobs show up in the Spark History Server (was: Oozie should automatically sets configs to make Spark jobs show up in the Spark History Server) Oozie should automatically set configs to make Spark jobs show up in the Spark History Server - Key: OOZIE-2170 URL: https://issues.apache.org/jira/browse/OOZIE-2170 Project: Oozie Issue Type: Improvement Components: action Affects Versions: trunk Reporter: Robert Kanter Assignee: Robert Kanter If you use yarn-cluster for the Spark action's master, the Spark jobs don't show up in the Spark History Server or properly link to it from the Spark AM. The user needs to set this in their Spark action in the workflow.xml: {code:xml} spark-opts--conf spark.yarn.historyServer.address=http://SPH18088 --conf spark.eventLog.dir=hdfs://NN:8020/user/spark/applicationHistory --conf spark.eventLog.enabled=true/spark-opts {code} It would be nice if Oozie did this automatically via some oozie-site.xml config(s). We could do something similar how the hadoop configs are setup where it will load a Spark .conf file from a directory based on the RM specified in the job-tracker. While we're at it, it might be good to document how to use Spark on YARN: # Include the spark-assembly jar with your workflow (this is unfortunately not published in maven) # Specify yarn-cluster as the master Also, the Spark example should delete the output dir in {{prepare}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)