[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-10 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805302#comment-17805302
 ] 

Rushabh Shah commented on HDFS-16064:
-

[~KevinWikant]  [~aajisaka]  Any reason why we haven't backported this fix to 
branch-2.10? 

> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805315#comment-17805315
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

zz12341 opened a new pull request, #6437:
URL: https://github.com/apache/hadoop/pull/6437

   …
   
   
   
   ### Description of PR
   
   [HDFS-16064](https://github.com/apache/hadoop/pull/4410) fixed an issue 
where decommissioning replicas were not counted as usable replicas, which 
caused decom to stuck forever in the case of small clusters. We are seeing the 
same issue on 2.10, and thus backporting the changes. 
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes havi

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805317#comment-17805317
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

shahrs87 commented on code in PR #6437:
URL: https://github.com/apache/hadoop/pull/6437#discussion_r1448086058


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:
##
@@ -791,15 +791,33 @@ public short getMinReplication() {
 return minReplication;
   }
 
+  public short getMinStorageNum(BlockInfo block) {

Review Comment:
   @zz12341  Why we want these changes? I don't see it in the original patch 
[here](https://github.com/apache/hadoop/pull/4410/files). 





> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due t

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805324#comment-17805324
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

zz12341 commented on code in PR #6437:
URL: https://github.com/apache/hadoop/pull/6437#discussion_r1448119281


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:
##
@@ -791,15 +791,33 @@ public short getMinReplication() {
 return minReplication;
   }
 
+  public short getMinStorageNum(BlockInfo block) {

Review Comment:
   I was originally trying to make it in sync with what trunk branch is doing: 
https://github.com/apache/hadoop/pull/4410/files#diff-305ecf45a0f0708849b5e3c0d21a56c681db3a1497e52a19ef24939278dc99feL1922-R1926
   
   Let me revert this change 





> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanod

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread Kevin Wikant (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805579#comment-17805579
 ] 

Kevin Wikant commented on HDFS-16064:
-

{quote}Any reason why we haven't backported this fix to branch-2.10? 
{quote}
Back in 2022, I did try to backport this change to 2.10.1 branch & encountered 
unit test failure due to inconsistent behavior when compared to Hadoop 3.x
{quote}> mvn test -Dtest=TestDecommission
...

[ERROR] Tests run: 27, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 
263.603 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDecommission
[ERROR] 
testDeleteCorruptReplicaForUnderReplicatedBlock(org.apache.hadoop.hdfs.TestDecommission)
  Time elapsed: 60.462 s  <<< ERROR!
java.lang.Exception: test timed out after 6 milliseconds
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:366)
        at 
org.apache.hadoop.hdfs.TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock(TestDecommission.java:1918)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{quote}
I do not remember all the root cause details, but from my notes:
 * "The inconsistent behavior has to do with when Datanodes in the 
MiniDFSCluster are sending full block reports vs incremental block reports and 
how that gets handled by the Namenode. Also, the triggerBlockReport method does 
not work in a MiniDFSCluster (i.e. no block report is sent) and there is no way 
to control sending of incremental vs full block reports."

These Hadoop 2.x behavior differences in Namenode/Datanode/MiniDFSCluster were 
not fully root caused & addressed, so this bug fix was only backported to 
Hadoop 3.x which was sufficient for our needs.

> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the na

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805699#comment-17805699
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

shahrs87 commented on code in PR #6437:
URL: https://github.com/apache/hadoop/pull/6437#discussion_r1449133212


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:
##
@@ -1463,21 +1463,28 @@ private void markBlockAsCorrupt(BlockToMarkCorrupt b,
 b.getReason(), b.getReasonCode());
 
 NumberReplicas numberOfReplicas = countNodes(b.getStored());
-boolean hasEnoughLiveReplicas = numberOfReplicas.liveReplicas() >=
-expectedReplicas;
+final int numUsableReplicas = numberOfReplicas.liveReplicas() +
+numberOfReplicas.decommissioning() +
+numberOfReplicas.liveEnteringMaintenanceReplicas();
+boolean hasEnoughLiveReplicas = numUsableReplicas >=
+expectedRedundancies;
+
 boolean minReplicationSatisfied =
 numberOfReplicas.liveReplicas() >= minReplication;

Review Comment:
   ```suggestion
   numUsableReplicas >= minReplication;
   ```





> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, I

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805704#comment-17805704
 ] 

Rushabh Shah commented on HDFS-16064:
-

Thank you [~KevinWikant]  for your reply. We also hit this bug recently in our 
production cluster. One of my colleague is trying to backport this patch to 
branch-2.10. Let us see if we have any luck backporting this change. 

> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsu

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805723#comment-17805723
 ] 

Rushabh Shah commented on HDFS-16064:
-

[~KevinWikant]  By any chance you have draft PR for branch-2.10 saved? 

> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805728#comment-17805728
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

shahrs87 commented on PR #6437:
URL: https://github.com/apache/hadoop/pull/6437#issuecomment-1887620199

   @zz12341  [branch-2.10](https://github.com/apache/hadoop/tree/branch-2.10) 
is the active branch for 2.10 (not 
[branch-2.10.0](https://github.com/apache/hadoop/tree/branch-2.10.0)). You can 
see all the active branches [here](https://github.com/apache/hadoop/branches). 




> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This mess

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805742#comment-17805742
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

shahrs87 commented on PR #6437:
URL: https://github.com/apache/hadoop/pull/6437#issuecomment-1887669346

   ![Screenshot 2024-01-11 at 9 45 49 
AM](https://github.com/apache/hadoop/assets/5529085/8b1fbfaf-17bf-49d2-87bc-b413ae94d2e6)
   @zz12341  This change in `BlockManager#addStoredBlock` is missing from the 
original patch? https://github.com/apache/hadoop/pull/4410
   
   




> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This mess

[jira] [Commented] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805765#comment-17805765
 ] 

ASF GitHub Bot commented on HDFS-16064:
---

shahrs87 commented on PR #6437:
URL: https://github.com/apache/hadoop/pull/6437#issuecomment-1887735325

   > [HDFS-16064] backporting HDFS-16064. Determine when to invalidate corrupt 
replicas based on number of usable replicas (#4410) #6437
   
   @zz12341  Keep the title and the commit message same as 
https://github.com/apache/hadoop/pull/4410. We shouldn't add backporting 
keyword in the title and commit message.




> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanode