[ 
https://issues.apache.org/jira/browse/HADOOP-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696765#action_12696765
 ] 

Andrew Hitchcock commented on HADOOP-5635:
------------------------------------------

It sounds like what you want is a new featue whereas this patch is just to fix 
a bug.

Currently the behavior is not right. If a user specifies a non-HDFS URI for 
distributed cache then the job will fail because the tasks look for the file in 
HDFS. This patch fixes that for cases when the user specifies a URI to another 
distributed file system. With the patch, if a user specifies KFS or S3N (and 
the file system is properly configured) then the job will succeed. The behavior 
for specifying a URI not accessible on every machine remains unchanged: the job 
will fail as tasks are unable to reach the URI.

I think a feature for administrators to restrict distributed cache access to 
certain file systems should be a new Jira.

> distributed cache doesn't work with other distributed file systems
> ------------------------------------------------------------------
>
>                 Key: HADOOP-5635
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5635
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Andrew Hitchcock
>            Priority: Minor
>         Attachments: fix-distributed-cache.patch
>
>
> Currently the DistributedCache does a check to see if the file to be included 
> is an HDFS URI. If the URI isn't in HDFS, it returns the default filesystem. 
> This prevents using other distributed file systems -- such as s3, s3n, or kfs 
>  -- with distributed cache. When a user tries to use one of those 
> filesystems, it reports an error that it can't find the path in HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to