[ 
https://issues.apache.org/jira/browse/OOZIE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724083#comment-16724083
 ] 

Andras Salamon commented on OOZIE-3404:
---------------------------------------

By default {{SPARK_HOME}} environment variable is set to the local directory by 
[SparkActionExecutor|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/action/hadoop/SparkActionExecutor.java#L105-L122].
 According to the code it appends it to the {{oozie.launcher.mapred.child.env}} 
(and {{mapred.child.env}}) properties.

You can also modify these properties in the {{<configuration>}} section of the 
workflow definition without code modification:
{noformat}
<property> 
<name>oozie.launcher.mapred.child.env</name> 
<value>SPARK_HOME=...</value> 
</property> 
{noformat}
You might also need to add the same value to 
{{oozie.launcher.yarn.app.mapreduce.am.env}} property.

 

> The env variable of SPARK_HOME needs to be set when running pySpark
> -------------------------------------------------------------------
>
>                 Key: OOZIE-3404
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3404
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Junfan Zhang
>            Assignee: Junfan Zhang
>            Priority: Major
>
> When we run spark in a cluster, we rely on the spark jars on hdfs. We don't 
> deploy Spark on the cluster server. So running pySpark according to the Oozie 
> documentation is not successful.
> Currently I have added the {{SPARK_HOME}} class environment variable to 
> {{sparkMain}} class and it has been able to run successfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to