[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Junping Du updated HDFS-11609: ------------------------------ Priority: Blocker (was: Critical) > Some blocks can be permanently lost if nodes are decommissioned while dead > -------------------------------------------------------------------------- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Priority: Blocker > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org