[
https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592557#action_12592557
]
Joydeep Sen Sarma commented on HADOOP-3307:
-------------------------------------------
if 'har' is truly a client side abstraction - then the assumption that the
protocol is hdfs - breaks this abstraction - no? one could imagine har archives
on top of local file system - or for that matter KFS or any other future file
system (say Lustre?).
also - the 'har' protocol is redundantly indicated in the uri scheme as well as
the file extension. conceivably - one could drop it from the uri scheme (and
thereby retain the ability to work with different file systems) and use the
presence of the .har extension in the file path to automatically layer on a
archive file system.
if done right - one should be able to support any archive format no?
essentially - we are just associating the .har extension as a trigger to switch
over to some nested file system (in this case, the har file system). one would
think that in future a .zip extension could be associated with a ZIP file
system provider which would allow nested view of the files/directories
underneath .. (this would be, quite nice, since many data sets float around as
zip files. one could just copy them into hdfs - and pronto - we are all set).
am also curious about the 'parallel creation' aspect (since that seems to be
the main argument for using a new archive format). how do we populate a single
hdfs file (backing the archive) in parallel?
> Archives in Hadoop.
> -------------------
>
> Key: HADOOP-3307
> URL: https://issues.apache.org/jira/browse/HADOOP-3307
> Project: Hadoop Core
> Issue Type: New Feature
> Components: fs
> Reporter: Mahadev konar
> Assignee: Mahadev konar
> Fix For: 0.18.0
>
>
> This is a new feature for archiving and unarchiving files in HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.