[ https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319852#comment-15319852 ]
Satish Subhashrao Saley edited comment on OOZIE-2547 at 6/8/16 4:15 PM: ------------------------------------------------------------------------ It is happening because spark-assembly.jar is getting a special treatment by Spark. In launch_container.sh, I see - {{export SPARK_YARN_CACHE_FILES="hdfs://localhost/user/saley/.sparkStaging/application_1234_123/spark-assembly.jar#__spark__.jar}} When we use spark-assembly.jar, it is getting copied with alias as *__spark__.jar* to the current directory of container. And while setting the classpath, it is referring to that file {{export CLASSPATH="$PWD:$PWD/__spark__.jar:}} But when we don't use spark-assembly.jar, we need to set the classpath explicitly. was (Author: satishsaley): It is happening because spark-assembly.jar is getting a special treatment by Spark. In launch_container.sh, I see - {{export SPARK_YARN_CACHE_FILES="hdfs://clusterx-nodey.yahoo.com/user/saley/.sparkStaging/application_1464113035484_0834/spark-assembly.jar#__spark__.jar}} When we use spark-assembly.jar, it is getting copied with alias as *__spark__.jar* to the current directory of container. And while setting the classpath, it is referring to that file {{export CLASSPATH="$PWD:$PWD/__spark__.jar:}} But when we don't use spark-assembly.jar, we need to set the classpath explicitly. > Add mapreduce.job.cache.files to spark action > --------------------------------------------- > > Key: OOZIE-2547 > URL: https://issues.apache.org/jira/browse/OOZIE-2547 > Project: Oozie > Issue Type: Bug > Reporter: Satish Subhashrao Saley > Assignee: Satish Subhashrao Saley > Priority: Minor > Attachments: OOZIE-2547-1.patch > > > Currently, we pass jars using --jars option while submitting spark job. Also, > we add spark.yarn.dist.files option in case of yarn-client mode. > Instead of that, we can have only --files option and pass on the files which > are present in mapreduce.job.cache.files. While doing so, we make sure that > spark won't make another copy of the files if files exist on the hdfs. We saw > the issues when files are getting copied multiple times and causing > exceptions such as : > {code} > Diagnostics: Resource > hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar > changed on src filesystem > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)