[
https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593070#action_12593070
]
Doug Cutting commented on HADOOP-3307:
--------------------------------------
> the intent is to change path to make it work....
Would you special case the handling of "har:" uri's in Path? Or would you
always parse queries as part of the hierarchical path? Both of these sound
like bad ideas to me.
We should not add special functionality to FileSystem or Path for "har:" uris.
We have a proposal that layers cleanly on top of the existing FileSystem and
Path implementations. Alternately, we might consider generic extensions to
FileSystem and/or Path, like symbolic links or mount points, to see whether
these might facilitate a more transparent archive implementation. But we
should not add special-purpose hacks for a particular archive format to these
generic classes.
Mounts of various sorts would be fairly easy to add, but perhaps not that easy
to use. I proposed a simple version above that requires no changes to existing
code. A mount capability that permitted one to attach a FileSystem
implementation at an arbitrary point in the URI space would not be overly hard
to add.
The primary downside of mount-based approaches is that they require state. One
would have to add something to the configuration or job for each mount point,
or require all FileSystem implementations to know how to store a mount, or add
a mount file type, or somesuch. Note that this is not a problem with Unix
mount, since there's only one system involved, but in a distributed system like
Hadoop we need to either transmit the mount points with code (e.g., in the job)
or somehow store them in the filesystem.
The current proposal, embedding the URI of the archive within a "har:" uri,
will both solve the problems at hand and require no architectural changes to
the filesystem. The only downside is that archive file naming is a little
obtuse. Long-term, the addition of symbolic links to FileSystem might address
that, no?
> Archives in Hadoop.
> -------------------
>
> Key: HADOOP-3307
> URL: https://issues.apache.org/jira/browse/HADOOP-3307
> Project: Hadoop Core
> Issue Type: New Feature
> Components: fs
> Reporter: Mahadev konar
> Assignee: Mahadev konar
> Fix For: 0.18.0
>
>
> This is a new feature for archiving and unarchiving files in HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.