Todd Lipcon created HDFS-4799: --------------------------------- Summary: Corrupt replica can be prematurely removed from corruptReplicas map Key: HDFS-4799 URL: https://issues.apache.org/jira/browse/HDFS-4799 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker
We saw the following sequence of events in a cluster result in losing the most recent genstamp of a block: - client is writing to a pipeline of 3 - the pipeline had nodes fail over some period of time, such that it left 3 old-genstamp replicas on the original three nodes, having recruited 3 new replicas with a later genstamp. -- so, we have 6 total replicas in the cluster, three with old genstamps on downed nodes, and 3 with the latest genstamp - cluster reboots, and the nodes with old genstamps blockReport first. The replicas are correctly added to the corrupt replicas map since they have a too-old genstamp - the nodes with the new genstamp block report. When the latest one block reports, chooseExcessReplicates is called and incorrectly decides to remove the three good replicas, leaving only the old-genstamp replicas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira