[
https://issues.apache.org/jira/browse/OOZIE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724083#comment-16724083
]
Andras Salamon commented on OOZIE-3404:
---------------------------------------
By default {{SPARK_HOME}} environment variable is set to the local directory by
[SparkActionExecutor|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/SparkActionExecutor.java#L105-L122].
According to the code it appends it to the {{oozie.launcher.mapred.child.env}}
(and {{mapred.child.env}}) properties.
You can also modify these properties in the {{<configuration>}} section of the
workflow definition without code modification:
{noformat}
<property>
<name>oozie.launcher.mapred.child.env</name>
<value>SPARK_HOME=...</value>
</property>
{noformat}
You might also need to add the same value to
{{oozie.launcher.yarn.app.mapreduce.am.env}} property.
> The env variable of SPARK_HOME needs to be set when running pySpark
> -------------------------------------------------------------------
>
> Key: OOZIE-3404
> URL: https://issues.apache.org/jira/browse/OOZIE-3404
> Project: Oozie
> Issue Type: Bug
> Reporter: Junfan Zhang
> Assignee: Junfan Zhang
> Priority: Major
>
> When we run spark in a cluster, we rely on the spark jars on hdfs. We don't
> deploy Spark on the cluster server. So running pySpark according to the Oozie
> documentation is not successful.
> Currently I have added the {{SPARK_HOME}} class environment variable to
> {{sparkMain}} class and it has been able to run successfully.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)