Todd Lipcon created HDFS-4799:
---------------------------------

             Summary: Corrupt replica can be prematurely removed from 
corruptReplicas map
                 Key: HDFS-4799
                 URL: https://issues.apache.org/jira/browse/HDFS-4799
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.0.4-alpha
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Blocker


We saw the following sequence of events in a cluster result in losing the most 
recent genstamp of a block:
- client is writing to a pipeline of 3
- the pipeline had nodes fail over some period of time, such that it left 3 
old-genstamp replicas on the original three nodes, having recruited 3 new 
replicas with a later genstamp.
-- so, we have 6 total replicas in the cluster, three with old genstamps on 
downed nodes, and 3 with the latest genstamp
- cluster reboots, and the nodes with old genstamps blockReport first. The 
replicas are correctly added to the corrupt replicas map since they have a 
too-old genstamp
- the nodes with the new genstamp block report. When the latest one block 
reports, chooseExcessReplicates is called and incorrectly decides to remove the 
three good replicas, leaving only the old-genstamp replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to