[ https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592678#action_12592678 ]
Mahadev konar commented on HADOOP-3307: --------------------------------------- >am also curious about the 'parallel creation' aspect (since that seems to be >the main argument for using a new > achive format). how do we populate a >single hdfs file (backing the archive) in parallel? The archive isnt a single file backed by an index but multiple files quoting from the design posted earlier in the comments: The format of an archive as a filesystem path is: /user/mahadev/foo.har/_index* /user/mahadev/foo.har/part-* The indexes store the filenames and the offset with the part files. Each map would create part-$i files and a single reduce or multiple reduces could create the index files in the archive directory. Does that help in understanding the design? > Archives in Hadoop. > ------------------- > > Key: HADOOP-3307 > URL: https://issues.apache.org/jira/browse/HADOOP-3307 > Project: Hadoop Core > Issue Type: New Feature > Components: fs > Reporter: Mahadev konar > Assignee: Mahadev konar > Fix For: 0.18.0 > > > This is a new feature for archiving and unarchiving files in HDFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.