[ https://issues.apache.org/jira/browse/HDFS-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748056#action_12748056 ]
Konstantin Shvachko commented on HDFS-550: ------------------------------------------ Suffix ".unlinked" would be better. > DataNode restarts may introduce corrupt/duplicated/lost replicas when > handling detached replicas > ------------------------------------------------------------------------------------------------ > > Key: HDFS-550 > URL: https://issues.apache.org/jira/browse/HDFS-550 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node > Affects Versions: 0.21.0 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Priority: Blocker > Fix For: Append Branch > > > Current trunk first calls detach to unlinks a finalized replica before > appending to this block. Unlink is done by temporally copying the block file > in the "current" subtree to a directory called "detach" under the volume's > daa directory and then copies it back when unlink succeeds. On datanode > restarts, datanodes recover faied unlink by copying replicas under "detach" > to "current". > There are two bugs with this implementation: > 1. The "detach" directory does not include in a snapshot. so rollback will > cause the "detaching" replicas to be lost. > 2. After a replica is copied to the "detach" directory, the information of > its original location is lost. The current implementation erroneously assumes > that the replica to be unlinked is under "current". This will make two > instances of replicas with the same block id to coexist in a datanode. Also > if a replica under "detach" is corrupt, the corrupt replica is moved to > "current" without being detected, polluting datanode data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.