[
https://issues.apache.org/jira/browse/MAPREDUCE-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833528#action_12833528
]
dhruba borthakur commented on MAPREDUCE-1491:
---------------------------------------------
Code changes look good. I have one question.
Suppose a directory /dhruba has 10 files in it. All files initially have a
replication factor of 3. Then the RaidNode creates a xxx_raid.har that replaces
all the parity files. Now suppose, a user deletes the first file /dhruba. Now
/dhruba has only 9 files. This patch will now delete the har file associated
with /dhruba. At this point, all the 9 files in /dhruba are left with a
replication factor of 2 only! Am I understanding this right? Of course, the har
file will get recreated pretty oon, but for some amount of time (however small)
there could be only two replicas of a block. If my understanding is correct,
can we create another JIRA that could address this situation?
> Use HAR filesystem to merge parity files
> -----------------------------------------
>
> Key: MAPREDUCE-1491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1491
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Rodrigo Schmidt
> Assignee: Rodrigo Schmidt
> Attachments: MAPREDUCE-1491.0.patch
>
>
> The HDFS raid implementation (HDFS-503) creates a parity file for every file
> that is RAIDed. This puts additional burden on the memory requirements of the
> namenode. It will be nice if the parity files are combined together using
> the HadoopArchive (har) format.
> This was (HDFS-684) before, but raid migrated to MAPREDUCE.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.