[ https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890858#comment-16890858 ]
He Xiaoqiao edited comment on HDFS-12820 at 7/23/19 10:16 AM: -------------------------------------------------------------- [~zhangchen] IIUC, {{nodeInService}} and other attributes should subtract/update when trigger decommission. (only for branch trunk, not check other branches.) FYI. {code:java} synchronized void startDecommission(final DatanodeDescriptor node) { if (!node.isAlive()) { LOG.info("Dead node {} is decommissioned immediately.", node); node.setDecommissioned(); } else { stats.subtract(node); // where node is still in service node.startDecommission(); stats.add(node); // where node is set to decommission in progress } } {code} was (Author: hexiaoqiao): [~zhangchen] IIUC, {{nodeInService}} and other attributes should subtract when trigger decommission. {code:java} synchronized void startDecommission(final DatanodeDescriptor node) { if (!node.isAlive()) { LOG.info("Dead node {} is decommissioned immediately.", node); node.setDecommissioned(); } else { stats.subtract(node); // where node is still in service node.startDecommission(); stats.add(node); // where node is set to decommission in progress } } {code} > Decommissioned datanode is counted in service cause datanode allcating failure > ------------------------------------------------------------------------------ > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement > Affects Versions: 2.4.0 > Reporter: Gang Xie > Priority: Major > > When allocate a datanode when dfsclient write with considering the load, it > checks if the datanode is overloaded by calculating the average xceivers of > all the in service datanode. But if the datanode is decommissioned and become > dead, it's still treated as in service, which make the average load much more > than the real one especially when the number of the decommissioned datanode > is great. In our cluster, 180 datanode, and 100 of them decommissioned, and > the average load is 17. This failed all the datanode allocation. > private void subtract(final DatanodeDescriptor node) { > capacityUsed -= node.getDfsUsed(); > blockPoolUsed -= node.getBlockPoolUsed(); > xceiverCount -= node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || > node.isDecommissioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -= node.getXceiverCount(); > capacityTotal -= node.getCapacity(); > capacityRemaining -= node.getRemaining(); > } else { > capacityTotal -= node.getDfsUsed(); > } > cacheCapacity -= node.getCacheCapacity(); > cacheUsed -= node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org