[ 
https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592557#action_12592557
 ] 

Joydeep Sen Sarma commented on HADOOP-3307:
-------------------------------------------

if 'har' is truly a client side abstraction - then the assumption that the 
protocol is hdfs - breaks this abstraction - no? one could imagine har archives 
on top of local file system - or for that matter KFS or any other future file 
system (say Lustre?).

also - the 'har' protocol is redundantly indicated in the uri scheme as well as 
the file extension. conceivably - one could drop it from the uri scheme (and 
thereby retain the ability to work with different file systems) and use the 
presence of the .har extension in the file path to automatically layer on a 
archive file system.

if done right - one should be able to support any archive format no? 
essentially - we are just associating the .har extension as a trigger to switch 
over to some nested file system (in this case, the har file system). one would 
think that in future a .zip extension could be associated with a ZIP file 
system provider which would allow nested view of the files/directories 
underneath .. (this would be, quite nice, since many data sets float around as 
zip files. one could just copy them into hdfs - and pronto - we are all set).

am also curious about the 'parallel creation' aspect (since that seems to be 
the main argument for using a new archive format). how do we populate a single 
hdfs file (backing the archive) in parallel? 

> Archives in Hadoop.
> -------------------
>
>                 Key: HADOOP-3307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3307
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.18.0
>
>
> This is a new feature for archiving and unarchiving files in HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to