[ 
https://issues.apache.org/jira/browse/HDFS-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215654#comment-17215654
 ] 

Fengnan Li commented on HDFS-15634:
-----------------------------------

Thanks for the comment [~sodonnell]. More context here:

We were decommissioning to swap with better hardware so these datanodes would 
not be used anymore.

We are running 2.8.2 with about 350K blocks on each datanode after they are 
decomed. We stopped ~200 datanodes at once (sounds crazy... and it does) 

I attached the graph for the writelock at that time. !write lock.png!

The goal of the whole ticket is not really about whether there will be missing 
block or not. And I don't think there will be unless you are decommissioning 
datanodes with all replicas at the same time which is out of the discussion. 
What I am proposing is to mitigate the impact to namenode performance. From 
this perspective, recommissioning a datanode with full blocks or stopping the 
node to have namenode clean up all the blocks at once are not ideal.

Balancing is actually not a concern for a large cluster (>3k datanodes) with 
high traffic since other blocks will soon fill up this new datanode.

> Invalidate block on decommissioning DataNode after replication
> --------------------------------------------------------------
>
>                 Key: HDFS-15634
>                 URL: https://issues.apache.org/jira/browse/HDFS-15634
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>            Reporter: Fengnan Li
>            Assignee: Fengnan Li
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: write lock.png
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Right now when a DataNode starts decommission, Namenode will mark it as 
> decommissioning and its blocks will be replicated over to different 
> DataNodes, then marked as decommissioned. These blocks are not touched since 
> they are not counted as live replicas.
> Proposal: Invalidate these blocks once they are replicated and there are 
> enough live replicas in the cluster.
> Reason: A recent shutdown of decommissioned datanodes to finished the flow 
> caused Namenode latency spike since namenode needs to remove all of the 
> blocks from its memory and this step requires holding write lock. If we have 
> gradually invalidated these blocks the deletion will be much easier and 
> faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to