[ https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317486#comment-15317486 ]
Satish Subhashrao Saley edited comment on OOZIE-2547 at 6/6/16 11:36 PM: ------------------------------------------------------------------------- Hello [~rkanter] and [~rohini], Could you please review the patch? I have removed the logic behind populating {{spark.executor.extraClassPath}}, {{spark.driver.extraClassPath}}, {{-- jars}} and {{spark.yarn.dist.files}}. Instead of that, now we are adding distributed cached files in {{-- files}}. While doing so, I also make sure that hdfs paths to those files are formulated such that spark won't make another copy. I have tested the patch locally as well as in clusters, it seems working fine with {{-- master}} as local,yarn-client and yarn-cluster. was (Author: satishsaley): Hello [~rkanter], Could you please review the patch? I have removed the logic behind populating {{spark.executor.extraClassPath}}, {{spark.driver.extraClassPath}}, {{-- jars}} and {{spark.yarn.dist.files}}. Instead of that, now we are adding distributed cached files in {{-- files}}. While doing so, I also make sure that hdfs paths to those files are formulated such that spark won't make another copy. I have tested the patch locally as well as in clusters, it seems working fine with {{-- master}} as local,yarn-client and yarn-cluster. > Add mapreduce.job.cache.files to spark action > --------------------------------------------- > > Key: OOZIE-2547 > URL: https://issues.apache.org/jira/browse/OOZIE-2547 > Project: Oozie > Issue Type: Bug > Reporter: Satish Subhashrao Saley > Assignee: Satish Subhashrao Saley > Priority: Minor > Attachments: OOZIE-2547-1.patch > > > Currently, we pass jars using --jars option while submitting spark job. Also, > we add spark.yarn.dist.files option in case of yarn-client mode. > Instead of that, we can have only --files option and pass on the files which > are present in mapreduce.job.cache.files. While doing so, we make sure that > spark won't make another copy of the files if files exist on the hdfs. We saw > the issues when files are getting copied multiple times and causing > exceptions such as : > {code} > Diagnostics: Resource > hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar > changed on src filesystem > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)