[jira] [Commented] (HDFS-13157) Do Not Remove Blocks Sequentially During Decommission

David Mollitor (Jira) Thu, 12 Sep 2019 11:58:12 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16928827#comment-16928827
 ]


David Mollitor commented on HDFS-13157:
---------------------------------------

[~sodonnell]

Doing a 'random queue' is very tricky.  It's always mathematically possible 
that some items sits in the queue for all time since it's random, there is no 
guarantee that it will ever be selected.

I am thinking something like this as an alternative:

# Mark each node as decommissioned
# Grab the lock
# Create small batches of blocks... rolling through the list of DataNodes, 
rolling through the list of volumes (as proposed in this patch)
# Wrap each item in the batch into a `Future` and submit them into the queue
# Release the lock
# Wait for every `Future` in the batch to complete (with a timeout)
# Repeat until done

This would require the Replication Queue take a future, which is probably not a 
bad thing anyway.


> Do Not Remove Blocks Sequentially During Decommission 
> ------------------------------------------------------
>
>                 Key: HDFS-13157
>                 URL: https://issues.apache.org/jira/browse/HDFS-13157
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HDFS-13157.1.patch
>
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13157) Do Not Remove Blocks Sequentially During Decommission

Reply via email to