[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-11609: - Fix Version/s: 2.9.0 > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.1 > > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, > HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, > HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, > HDFS-11609_v3.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-11609: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.1 3.0.0-alpha3 2.7.4 Status: Resolved (was: Patch Available) Committed to trunk through branch-2.7. > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.7.4, 3.0.0-alpha3, 2.8.1 > > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, > HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, > HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, > HDFS-11609_v3.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-11609: -- Attachment: HDFS-11609_v3.trunk.patch HDFS-11609_v3.branch-2.patch HDFS-11609_v3.branch-2.7.patch Attaching patches with the comment corrected. The code changes are identical to the v2. > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, > HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, > HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, > HDFS-11609_v3.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-11609: -- Target Version/s: 2.7.4, 2.8.1 (was: 2.8.1) > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, > HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-11609: -- Attachment: HDFS-11609_v2.trunk.patch HDFS-11609_v2.branch-2.patch Attaching the updated patches. > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, > HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-11609: -- Priority: Blocker (was: Critical) > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Blocker > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-11609: -- Attachment: HDFS-11609.branch-2.patch Attaching branch-2 version of the patch, as the trunk version was a phenomenal success. > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-11609: -- Status: Patch Available (was: Open) > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-11609.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead
[ https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-11609: -- Attachment: HDFS-11609.trunk.patch > Some blocks can be permanently lost if nodes are decommissioned while dead > -- > > Key: HDFS-11609 > URL: https://issues.apache.org/jira/browse/HDFS-11609 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: HDFS-11609.trunk.patch > > > When all the nodes containing a replica of a block are decommissioned while > they are dead, they get decommissioned right away even if there are missing > blocks. This behavior was introduced by HDFS-7374. > The problem starts when those decommissioned nodes are brought back online. > The namenode no longer shows missing blocks, which creates a false sense of > cluster health. When the decommissioned nodes are removed and reformatted, > the block data is permanently lost. The namenode will report missing blocks > after the heartbeat recheck interval (e.g. 10 minutes) from the moment the > last node is taken down. > There are multiple issues in the code. As some cause different behaviors in > testing vs. production, it took a while to reproduce it in a unit test. I > will present analysis and proposal soon. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org