[ 
https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735986#comment-16735986
 ] 

Kihwal Lee edited comment on HDFS-14186 at 1/7/19 3:57 PM:
-----------------------------------------------------------

I think nodes are already not marked "dead" in the startup safe mode. But the 
RPC timeout will make datanodes resend the full block reports. Is your 
datanodes configured to break up the reports per storage and send one by one? 
At least that will make each rpc payload(report) to be smaller and reduce the 
wasted work/heap.  In the 8-hour case, how much was the GC overhead? How many 
blocks were involved?  Was it replaying edits during the starup? I.e. did it 
have to hold a long lock to replay edits and reports piled up? Was the retry 
cache on?


was (Author: kihwal):
I think nodes are already not marked "dead" in the startup safe mode. But the 
RPC timeout will make datanodes resend the full block reports. Is your 
datanodes configured to break up the reports per storage and send one by one? 
At least that will make each rpc payload(report) to be smaller and reduce the 
wasted work/heap.  In the 8-hour case, how much was the GC overhead? How many 
blocks were involved?  Was it replaying edits during the starup?

> blockreport storm slow down namenode restart seriously in large cluster
> -----------------------------------------------------------------------
>
>                 Key: HDFS-14186
>                 URL: https://issues.apache.org/jira/browse/HDFS-14186
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>
> In the current implementation, the datanode sends blockreport immediately 
> after register to namenode successfully when restart, and the blockreport 
> storm will make namenode high load to process them. One result is some 
> received RPC have to skip because queue time is timeout. If some datanodes' 
> heartbeat RPC are continually skipped for long times (default is 
> heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to 
> re-register and send blockreport again, aggravate blockreport storm and trap 
> in a vicious circle, and slow down (more than one hour and even more) 
> namenode startup seriously in a large (several thousands of datanodes) and 
> busy cluster especially. Although there are many work to optimize namenode 
> startup, the issue still exists. 
> I propose to postpone dead datanode check when namenode have finished startup.
> Any comments and suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to