[
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18024616#comment-18024616
]
ASF GitHub Bot commented on HDFS-16064:
---------------------------------------
github-actions[bot] closed pull request #6437: [HDFS-16064] backporting
HDFS-16064. Determine when to invalidate corrupt replicas based on number of
usable replicas (#4410)
URL: https://github.com/apache/hadoop/pull/6437
> Determine when to invalidate corrupt replicas based on number of usable
> replicas
> --------------------------------------------------------------------------------
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 3.2.1
> Reporter: Kevin Wikant
> Assignee: Kevin Wikant
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.5
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a
> non-issue under the assumption that if the namenode & a datanode get into an
> inconsistent state for a given block pipeline, there should be another
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
> * there are initially 4 datanodes DN1, DN2, DN3, DN4
> * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
> * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in
> order to satisfy their minimum replication factor of 2
> * during this replication process
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes
> the following inconsistent state:
> ** DN3 thinks it has the block pipeline in FINALIZED state
> ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode
> (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]):
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654
> dst: /DN3:9866;
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
> * the replication is attempted again, but:
> ** DN4 has the block
> ** DN1 and/or DN2 have the block, but don't count towards the minimum
> replication factor because they are being decommissioned
> ** DN3 does not have the block & cannot have the block replicated to it
> because of HDFS-721
> * the namenode repeatedly tries to replicate the block to DN3 & repeatedly
> fails, this continues indefinitely
> * therefore DN4 is the only live datanode with the block & the minimum
> replication factor of 2 cannot be satisfied
> * because the minimum replication factor cannot be satisfied for the
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be
> completed
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0):
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0,
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 ,
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0):
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0,
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 ,
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of
> DataNode decommissioning
> A few potential solutions:
> * Address the root cause of the problem which is an inconsistent state
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
> * Detect when datanode decommissioning is stuck due to lack of available
> datanodes for satisfying the minimum replication factor, then recover by
> re-enabling the datanodes being decommissioned
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]