[ 
https://issues.apache.org/jira/browse/OOZIE-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322820#comment-15322820
 ] 

Rohini Palaniswamy commented on OOZIE-2547:
-------------------------------------------

bq. As a general question, have you tested this with NN HA? I'm concerned about 
SparkMain#fixFsDefaultUris because it does some URI host/port 
parsing/manipulation.
  It should be safe. Basically what it does is if the fs.defaultFS is defined 
as hdfs://cluster-nn.domain.com:8020, it changes the following urls to 
hdfs://cluster-nn.domain.com:8020/path
    - hdfs://cluster-nn.domain.com/path
    - hdfs:///path
    - hdfs://cluster-nn.domain.com:8020/path (remains same)

If the fs.defaultFS is defined as hdfs://cluster-nn.domain.com, then it will 
change all of them to hdfs://cluster-nn.domain.com/path.  If the paths 
authority does not exactly match with fs.defaultFS, Spark does a reupload to 
hdfs and then localizes it. This is to avoid that.


> Add mapreduce.job.cache.files to spark action
> ---------------------------------------------
>
>                 Key: OOZIE-2547
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2547
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Satish Subhashrao Saley
>            Assignee: Satish Subhashrao Saley
>            Priority: Minor
>         Attachments: OOZIE-2547-1.patch, OOZIE-2547-4.patch, 
> yarn-cluster_launcher.txt
>
>
> Currently, we pass jars using --jars option while submitting spark job. Also, 
> we add spark.yarn.dist.files option in case of yarn-client mode. 
> Instead of that, we can have only --files option and pass on the files which 
> are present in mapreduce.job.cache.files. While doing so, we make sure that 
> spark won't make another copy of the files if files exist on the hdfs. We saw 
> the issues when files are getting copied multiple times and causing 
> exceptions such as :
> {code}
> Diagnostics: Resource 
> hdfs://localhost/user/saley/.sparkStaging/application_1234_123/oozie-examples.jar
>  changed on src filesystem
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to