[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833528#action_12833528
 ] 

dhruba borthakur commented on MAPREDUCE-1491:
---------------------------------------------

Code changes look good. I have one question.

Suppose a directory /dhruba has 10 files in it. All files initially have a 
replication factor of 3. Then the RaidNode creates a xxx_raid.har that replaces 
all the parity files. Now suppose, a user deletes the first file /dhruba. Now 
/dhruba has only 9 files. This patch will now delete the har file associated 
with /dhruba. At this point, all the 9 files in /dhruba are left with a 
replication factor of 2 only! Am I understanding this right? Of course, the har 
file will get recreated pretty oon, but for some amount of time (however small) 
there could be only two replicas of a block. If my understanding is correct, 
can we create another JIRA that could address this situation?

> Use HAR filesystem to merge parity files 
> -----------------------------------------
>
>                 Key: MAPREDUCE-1491
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1491
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Rodrigo Schmidt
>            Assignee: Rodrigo Schmidt
>         Attachments: MAPREDUCE-1491.0.patch
>
>
> The HDFS raid implementation (HDFS-503) creates a parity file for every file 
> that is RAIDed. This puts additional burden on the memory requirements of the 
> namenode. It will be  nice if the parity files are combined together using 
> the HadoopArchive (har) format.
> This was (HDFS-684) before, but raid migrated to MAPREDUCE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to