[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster

zhengchenyu (Jira) Fri, 17 Apr 2020 01:53:14 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085562#comment-17085562
 ]


zhengchenyu commented on HDFS-14186:
------------------------------------

We are also faced with a crazy cluster which has 639M total filesystem object, 
and also suffer block report storm when restart namenode. To avoid restart all 
datanodes, we solved this problem by limit the block repot queue. But I think 
lifeline is best idea to slove this problem. Then we will merge lifeline code 
to our cluster in the future.

But I don't think exit safe mode early is good idea. In our experience, if safe 
mode exits, in active namenode, processMisReplicatesAsync and 
ReplicationMonitor will start. This will result in lock contention. Many 
BlockReceivedAndDeleted will block in BlockOpsQueued. Even I think safe mode 
exit util BlockOpsQueued is not busy.
(Note: In our scenario, we only restart namenode, and make sure 
BlockReceivedAndDeleted is called after BlockReport when restart the namenode.)

> blockreport storm slow down namenode restart seriously in large cluster
> -----------------------------------------------------------------------
>
>                 Key: HDFS-14186
>                 URL: https://issues.apache.org/jira/browse/HDFS-14186
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.7.1
>            Reporter: Xiaoqiao He
>            Assignee: Xiaoqiao He
>            Priority: Major
>         Attachments: HDFS-14186.001.patch
>
>
> In the current implementation, the datanode sends blockreport immediately 
> after register to namenode successfully when restart, and the blockreport 
> storm will make namenode high load to process them. One result is some 
> received RPC have to skip because queue time is timeout. If some datanodes' 
> heartbeat RPC are continually skipped for long times (default is 
> heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to 
> re-register and send blockreport again, aggravate blockreport storm and trap 
> in a vicious circle, and slow down (more than one hour and even more) 
> namenode startup seriously in a large (several thousands of datanodes) and 
> busy cluster especially. Although there are many work to optimize namenode 
> startup, the issue still exists. 
> I propose to postpone dead datanode check when namenode have finished startup.
> Any comments and suggestions are welcome.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14186) blockreport storm slow down namenode restart seriously in large cluster

Reply via email to