[jira] [Comment Edited] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

He Xiaoqiao (JIRA) Tue, 23 Jul 2019 03:17:46 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890858#comment-16890858
 ]


He Xiaoqiao edited comment on HDFS-12820 at 7/23/19 10:16 AM:
--------------------------------------------------------------

[~zhangchen] IIUC, {{nodeInService}} and other attributes should 
subtract/update when trigger decommission. (only for branch trunk, not check 
other branches.) FYI.
{code:java}
  synchronized void startDecommission(final DatanodeDescriptor node) {
    if (!node.isAlive()) {
      LOG.info("Dead node {} is decommissioned immediately.", node);
      node.setDecommissioned();
    } else {
      stats.subtract(node);  // where node is still in service
      node.startDecommission();
      stats.add(node);        // where node is set to decommission in progress
    }
  }
{code}


was (Author: hexiaoqiao):
[~zhangchen] IIUC, {{nodeInService}} and other attributes should subtract when 
trigger decommission.
{code:java}
  synchronized void startDecommission(final DatanodeDescriptor node) {
    if (!node.isAlive()) {
      LOG.info("Dead node {} is decommissioned immediately.", node);
      node.setDecommissioned();
    } else {
      stats.subtract(node);  // where node is still in service
      node.startDecommission();
      stats.add(node);        // where node is set to decommission in progress
    }
  }
{code}

> Decommissioned datanode is counted in service cause datanode allcating failure
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-12820
>                 URL: https://issues.apache.org/jira/browse/HDFS-12820
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: block placement
>    Affects Versions: 2.4.0
>            Reporter: Gang Xie
>            Priority: Major
>
> When allocate a datanode when dfsclient write with considering the load, it 
> checks if the datanode is overloaded by calculating the average xceivers of 
> all the in service datanode. But if the datanode is decommissioned and become 
> dead, it's still treated as in service, which make the average load much more 
> than the real one especially when the number of the decommissioned datanode 
> is great. In our cluster, 180 datanode, and 100 of them decommissioned, and 
> the average load is 17. This failed all the datanode allocation. 
> private void subtract(final DatanodeDescriptor node) {
>       capacityUsed -= node.getDfsUsed();
>       blockPoolUsed -= node.getBlockPoolUsed();
>       xceiverCount -= node.getXceiverCount();
>     {color:red}  if (!(node.isDecommissionInProgress() || 
> node.isDecommissioned())) {{color}
>         nodesInService--;
>         nodesInServiceXceiverCount -= node.getXceiverCount();
>         capacityTotal -= node.getCapacity();
>         capacityRemaining -= node.getRemaining();
>       } else {
>         capacityTotal -= node.getDfsUsed();
>       }
>       cacheCapacity -= node.getCacheCapacity();
>       cacheUsed -= node.getCacheUsed();
>     }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure

Reply via email to