[ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920359#comment-16920359
 ] 

Chen Zhang edited comment on HDFS-13157 at 9/1/19 9:55 AM:
-----------------------------------------------------------

We've observed same problem on our production cluster about half year ago, it's 
very easy to repro the issue when we decommissioning the node with warm data, 
we use HDFS Raid to convert these data to Erasure Coding, so every block have 
only 1 replica, and every block only have 1 source node to replicate the data.

We observed that the disks I/O utilization raise to 100% one by one on the 
decommissioning node, it make the decommission progress very slow.


was (Author: zhangchen):
We've observed same problem on our production cluster about half year ago, it's 
very easy to repro the issue when we decommissioning the node with warm data, 
we use HDFS Raid to convert these data to Erasure Coding, so every block have 
only 1 replica, so every block only have 1 source node to replicate the data.

We observed that the disks I/O utilization raise to 100% one by one on the 
decommissioning node, it make the decommission progress very slow.

> Do Not Remove Blocks Sequentially During Decommission 
> ------------------------------------------------------
>
>                 Key: HDFS-13157
>                 URL: https://issues.apache.org/jira/browse/HDFS-13157
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to