lost replicas when handling detached replicas

Konstantin Shvachko (JIRA) Wed, 26 Aug 2009 11:10:24 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12748056#action_12748056
 ]


Konstantin Shvachko commented on HDFS-550:
------------------------------------------

Suffix ".unlinked" would be better.

> DataNode restarts may introduce corrupt/duplicated/lost replicas when 
> handling detached replicas
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-550
>                 URL: https://issues.apache.org/jira/browse/HDFS-550
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.21.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: Append Branch
>
>
> Current trunk first calls detach to unlinks a finalized replica before 
> appending to this block. Unlink is done by temporally copying the block file 
> in the "current" subtree to a directory called "detach" under the volume's 
> daa directory and then copies it back when unlink succeeds. On datanode 
> restarts, datanodes recover faied unlink by copying replicas under "detach" 
> to "current".
> There are two bugs with this implementation:
> 1. The "detach" directory does not include in a snapshot. so rollback will 
> cause the "detaching" replicas to be lost.
> 2. After a replica is copied to the "detach" directory, the information of 
> its original location is lost. The current implementation erroneously assumes 
> that the replica to be unlinked is under "current". This will make two 
> instances of replicas with the same block id to coexist in a datanode. Also 
> if a replica under "detach" is corrupt, the corrupt replica is moved to 
> "current" without being detected, polluting datanode data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-550) DataNode restarts may introduce corrupt/duplicated/lost replicas when handling detached replicas

Reply via email to