[ 
https://issues.apache.org/jira/browse/HDFS-13157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925165#comment-16925165
 ] 

David Mollitor edited comment on HDFS-13157 at 9/8/19 1:01 PM:
---------------------------------------------------------------

Thank you all for the great investigatory work.

One question I have is in regards to:

bq. dropped until all other blocks have been tried and the iterator cycles 
round.

On the second run through the Iterator, shouldn't there be many fewer blocks in 
the list since many were successfully replicated away?


Regardless, I would like to propose a few idea for addressing these concerns.

# Update the Iterator to rotate over the volumes (it is very common that an 
Iterator does not guarantee any kind of order)
# With the lock held, process each DataNode as a task and process each task on 
its own thread.  In this way the duration of the lock held can be decreased and 
the replication requests will be interwoven across all the relevant DataNodes 
in the queue instead of loading the requests DN by DN in a serial fashion.
# Wrap the requests in the queue with a TTL value.  Requests that are rejected, 
for whatever reason, are placed back into the queue instead of dropped.  If 
they rotate through the queue many times without being serviced, then perform 
some sort of error logging and drop the request.
# Update documentation to include this information regarding how many blocks 
can be replicated per day, etc.


was (Author: belugabehr):
Thank you all for the great investigatory work.

One question I have is in regards to:

bq. dropped until all other blocks have been tried and the iterator cycles 
round.

On the second run through the Iterator, shouldn't there be many fewer blocks in 
the list since many were successfully replicated away?


Regardless, I would like to propose a few idea for addressing these concerns.

# Update the Iterator to rotate over the volumes (it is very common that an 
Iterator does not guarantee any kind of order)
# With the lock held, process each DataNode as a task and process each task on 
its own thread.  In this way the duration of the lock held can be decreased and 
the replication requests will be interwoven across all the relevant DataNodes 
in the queue instead of loading the requests DN by DN in a serial fashion.
# Update documentation to include this information regarding how many blocks 
can be replicated per day, etc.

> Do Not Remove Blocks Sequentially During Decommission 
> ------------------------------------------------------
>
>                 Key: HDFS-13157
>                 URL: https://issues.apache.org/jira/browse/HDFS-13157
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HDFS-13157.1.patch
>
>
> From what I understand of [DataNode 
> decommissioning|https://github.com/apache/hadoop/blob/42a1c98597e6dba2e371510a6b2b6b1fb94e4090/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java]
>  it appears that all the blocks are scheduled for removal _in order._. I'm 
> not 100% sure what the ordering is exactly, but I think it loops through each 
> data volume and schedules each block to be replicated elsewhere. The net 
> affect is that during a decommission, all of the DataNode transfer threads 
> slam on a single volume until it is cleaned out. At which point, they all 
> slam on the next volume, etc.
> Please randomize the block list so that there is a more even distribution 
> across all volumes when decommissioning a node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to