[ 
https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated HDFS-16456:
-------------------------------
    Attachment: HDFS-16456.001.patch

> EC: Decommission a rack with only on dn will fail when the rack number is 
> equal with replication
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-16456
>                 URL: https://issues.apache.org/jira/browse/HDFS-16456
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec, namenode
>    Affects Versions: 3.1.1
>            Reporter: caozhiqiang
>            Priority: Major
>         Attachments: HDFS-16456.001.patch
>
>
> In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason:
>  # Enable EC policy, such as RS-6-3-1024k.
>  # The rack number in this cluster is equal with the replication number(9)
>  # A rack only has one DN, and decommission this DN.
> The root cause is in 
> BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will 
> give a limit parameter maxNodesPerRack for choose targets. In this scenario, 
> the maxNodesPerRack is 1, which means each rack can only be chosen one 
> datanode.
> When we decommission one dn which is only one node in its rack, the 
> chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() 
> will throw NotEnoughReplicasException, but the exception will not be caught 
> and fail to fallback to chooseEvenlyFromRemainingRacks() function.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to