[ 
https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319852#comment-15319852
 ] 

Satish Subhashrao Saley edited comment on OOZIE-2547 at 6/8/16 4:15 PM:
------------------------------------------------------------------------

It is happening because spark-assembly.jar is getting a special treatment by 
Spark. In launch_container.sh, I see - 

{{export 
SPARK_YARN_CACHE_FILES="hdfs://localhost/user/saley/.sparkStaging/application_1234_123/spark-assembly.jar#__spark__.jar}}

When we use spark-assembly.jar, it is getting copied with alias as 
*__spark__.jar* to the current directory of container. And while setting the 
classpath, it is referring to that file 

{{export CLASSPATH="$PWD:$PWD/__spark__.jar:}}

But when we don't use spark-assembly.jar, we need to set the classpath 
explicitly. 



was (Author: satishsaley):
It is happening because spark-assembly.jar is getting a special treatment by 
Spark. In launch_container.sh, I see - 

{{export 
SPARK_YARN_CACHE_FILES="hdfs://clusterx-nodey.yahoo.com/user/saley/.sparkStaging/application_1464113035484_0834/spark-assembly.jar#__spark__.jar}}

When we use spark-assembly.jar, it is getting copied with alias as 
*__spark__.jar* to the current directory of container. And while setting the 
classpath, it is referring to that file 

{{export CLASSPATH="$PWD:$PWD/__spark__.jar:}}

But when we don't use spark-assembly.jar, we need to set the classpath 
explicitly. 


> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
>                 Key: OOZIE-2547
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2547
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Minor
>         Attachments: OOZIE-2547-1.patch
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also, 
> we add spark.yarn.dist.files option in case of yarn-client mode. 
> Instead of that, we can have only --files option and pass on the files which 
> are present in mapreduce.job.cache.files. While doing so, we make sure that 
> spark won't make another copy of the files if files exist on the hdfs. We saw 
> the issues when files are getting copied multiple times and causing 
> exceptions such as :
> {code}
> Diagnostics: Resource 
> hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar
>  changed on src filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to