[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-05-30 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-11609:
-
Fix Version/s: 2.9.0

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.1
>
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, 
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, 
> HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, 
> HDFS-11609_v3.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-05-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11609:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.1
   3.0.0-alpha3
   2.7.4
   Status: Resolved  (was: Patch Available)

Committed to trunk through branch-2.7.

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 2.7.4, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, 
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, 
> HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, 
> HDFS-11609_v3.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-28 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11609:
--
Attachment: HDFS-11609_v3.trunk.patch
HDFS-11609_v3.branch-2.patch
HDFS-11609_v3.branch-2.7.patch

Attaching patches with the comment corrected.  The code changes are identical 
to the v2.

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, 
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch, 
> HDFS-11609_v3.branch-2.7.patch, HDFS-11609_v3.branch-2.patch, 
> HDFS-11609_v3.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-25 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-11609:
--
Target Version/s: 2.7.4, 2.8.1  (was: 2.8.1)

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, 
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-18 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11609:
--
Attachment: HDFS-11609_v2.trunk.patch
HDFS-11609_v2.branch-2.patch

Attaching the updated patches.

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch, 
> HDFS-11609_v2.branch-2.patch, HDFS-11609_v2.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-11609:
--
Priority: Blocker  (was: Critical)

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-04 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11609:
--
Attachment: HDFS-11609.branch-2.patch

Attaching branch-2 version of the patch, as the trunk version was a phenomenal 
success.

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11609.branch-2.patch, HDFS-11609.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-03 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11609:
--
Status: Patch Available  (was: Open)

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11609.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11609) Some blocks can be permanently lost if nodes are decommissioned while dead

2017-04-03 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-11609:
--
Attachment: HDFS-11609.trunk.patch

> Some blocks can be permanently lost if nodes are decommissioned while dead
> --
>
> Key: HDFS-11609
> URL: https://issues.apache.org/jira/browse/HDFS-11609
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-11609.trunk.patch
>
>
> When all the nodes containing a replica of a block are decommissioned while 
> they are dead, they get decommissioned right away even if there are missing 
> blocks. This behavior was introduced by HDFS-7374.
> The problem starts when those decommissioned nodes are brought back online. 
> The namenode no longer shows missing blocks, which creates a false sense of 
> cluster health. When the decommissioned nodes are removed and reformatted, 
> the block data is permanently lost. The namenode will report missing blocks 
> after the heartbeat recheck interval (e.g. 10 minutes) from the moment the 
> last node is taken down.
> There are multiple issues in the code. As some cause different behaviors in 
> testing vs. production, it took a while to reproduce it in a unit test. I 
> will present analysis and proposal soon.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org