[
https://issues.apache.org/jira/browse/HADOOP-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696765#action_12696765
]
Andrew Hitchcock commented on HADOOP-5635:
------------------------------------------
It sounds like what you want is a new featue whereas this patch is just to fix
a bug.
Currently the behavior is not right. If a user specifies a non-HDFS URI for
distributed cache then the job will fail because the tasks look for the file in
HDFS. This patch fixes that for cases when the user specifies a URI to another
distributed file system. With the patch, if a user specifies KFS or S3N (and
the file system is properly configured) then the job will succeed. The behavior
for specifying a URI not accessible on every machine remains unchanged: the job
will fail as tasks are unable to reach the URI.
I think a feature for administrators to restrict distributed cache access to
certain file systems should be a new Jira.
> distributed cache doesn't work with other distributed file systems
> ------------------------------------------------------------------
>
> Key: HADOOP-5635
> URL: https://issues.apache.org/jira/browse/HADOOP-5635
> Project: Hadoop Core
> Issue Type: Bug
> Components: filecache
> Reporter: Andrew Hitchcock
> Priority: Minor
> Attachments: fix-distributed-cache.patch
>
>
> Currently the DistributedCache does a check to see if the file to be included
> is an HDFS URI. If the URI isn't in HDFS, it returns the default filesystem.
> This prevents using other distributed file systems -- such as s3, s3n, or kfs
> -- with distributed cache. When a user tries to use one of those
> filesystems, it reports an error that it can't find the path in HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.