[
https://issues.apache.org/jira/browse/OOZIE-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858502#comment-15858502
]
Satish Subhashrao Saley edited comment on OOZIE-2787 at 2/8/17 10:40 PM:
-
Reopening as there is a regression.
{code}
pi.py is under oozie.wf.application.path and workflow configuration is -
pyspark example
pi.py
${testConf}
pi.py#pi-renamed.py
{code}
With the change we added in this jira, it will call run the Spark job with
following params:
{code}
--master
yarn-cluster
--name
pyspark example
--conf
spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
--conf
spark.ui.view.acls=*
--queue
default
--conf
spark.executor.extraClassPath=$PWD/*
--conf
spark.driver.extraClassPath=$PWD/*
--conf
spark.yarn.security.tokens.hive.enabled=false
--conf
spark.yarn.security.tokens.hbase.enabled=false
--conf
spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
--properties-file
spark-defaults.conf
--files
<>
--conf
spark.yarn.jar=hdfs://localhost/share/spark/lib/spark-assembly.jar
--verbose
hdfs://localhost/user/saley/examples/apps/spark-yarn-cluster/pi.py#pi-renamed.py
10
{code}
The job fails saying -
{code}
2017-02-07 21:59:24,847 [Driver] ERROR
org.apache.spark.deploy.yarn.ApplicationMaster - User application exited with
status 2
2017-02-07 21:59:24,849 [Driver] INFO
org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED,
exitCode: 2, (reason: User application exited with status 2)
python: can't open file 'pi.py#pi-renamed.py': [Errno 2] No such file or
directory
{code}
Spark does not understand the {{#}} sign.
Therefore, we need to pass in the direct path for the file.
But at the same time, we also need to make sure that application jar won't get
distributed twice.
Solution - Mention the direct path for the application jar/py file if there is
a {{#}} sign (fragment) in the path. We can do so, because file is already
available in the launcher's local directory i.e. current directory. Also, at
the same time remove the application jar from *--files* option.
was (Author: satishsaley):
Reopening as there is a regression.
{code}
pi.py is under oozie.wf.application.path and workflow configuration is -
pyspark example
pi.py
${testConf}
pi.py#pi-renamed.py
{code}
With the change we added in this jira, it will call run the Spark job with
following params:
{code}
--master
yarn-cluster
--name
pyspark example
--conf
spark.driver.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
--conf
spark.ui.view.acls=*
--queue
default
--conf
spark.executor.extraClassPath=$PWD/*
--conf
spark.driver.extraClassPath=$PWD/*
--conf
spark.yarn.security.tokens.hive.enabled=false
--conf
spark.yarn.security.tokens.hbase.enabled=false
--conf
spark.executor.extraJavaOptions=-Dlog4j.configuration=spark-log4j.properties
--properties-file
spark-defaults.conf
--files
<>
--conf
spark.yarn.jar=hdfs://localhost/share/spark/lib/spark-assembly.jar
--verbose
hdfs://localhost/user/saley/examples/apps/spark-yarn-cluster/pi.py#pi-renamed.py
10
{code}
The job fails saying -
{code}
2017-02-07 21:59:24,847 [Driver] ERROR
org.apache.spark.deploy.yarn.ApplicationMaster - User application exited with
status 2
2017-02-07 21:59:24,849 [Driver] INFO
org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED,
exitCode: 2, (reason: User application exited with status 2)
python: can't open file 'pi.py#pi-renamed.py': [Errno 2] No such file or
directory
{code}
Spark does not understand the {{#}} sign.
Therefore, we need to pass in the direct path for the file.
But at the same time, we also need to make sure that application jar won't get
distributed twice.
Solution - Mention the direct path for the application jar/py file if there is
a {{#}} sign (fragment) in the path. We can do so, because file is already
available in the launcher's local directory i.e. current directory. Also, at
the same time remove the application jar from *--files* option. While doing so,
we need extra checks for PySpark dependencies otherwise those will get
distributed multiple times. The amend patch will also distribute the files
mentioned in and having {{#}}.
> Oozie distributes application jar twice making the spark job fail
> -
>
> Key: OOZIE-2787
> URL: https://issues.apache.org/jira/browse/OOZIE-2787
> Project: Oozie
> Issue Type: Bug
>Reporter: Satish Subhashrao Saley
>Assignee: Satish Subhashrao Saley
> Attachments: OOZIE-2787-1.patch, OOZIE-2787-2.patch,
> OOZIE-2787-3.patch, OOZIE-2787-4.patch, OOZIE-2787-5.patch
>
>
> Oozie adds the application jar to the list of files to be uploaded to
> distributed cache. Since this gets added twice, the job fails. This is
> observ