[ https://issues.apache.org/jira/browse/HDFS-14186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742785#comment-16742785 ]
Íñigo Goiri commented on HDFS-14186: ------------------------------------ For my own sanity, the path for the lifeline is the following: # {{NamenodeRPCServer#sendLifeline()}} in a different RPC handler. # {{FSNameSystem#handleLifeline()}} with no lock. # {{DatanodeManager#handleLifeline()}} uses {{getDatanode()}} and there is no locks or anything in these two. # {{HeartbeatManager#updateLifeline()}} is synchronized within the object. # {{BlockManager#updateHeartbeatState()}} is unlocked and uses {{DatanodeDescriptor#updateHeartbeatState()}} which seems fine. So the point of conflict is {{HeartbeatManager#updateLifeline()}} which fights with {{register()}} and {{updateHeartbeat()}}. >From your description, I'm guessing that these two functions are the ones >taking a long time. I'm not very familiar with the {{synchronized}} but it looks like it doesn't have a particular order. Could we change the {{HeartbeatManager}} locking model there? There was some discussion about this in HDFS-9239 but it doesn't look it made it very far. I have to say that is very tempting to make a part of {{HeartbeatManager#updateLifeline()}} not synchronized and just update the timestamp there if the load is high. > blockreport storm slow down namenode restart seriously in large cluster > ----------------------------------------------------------------------- > > Key: HDFS-14186 > URL: https://issues.apache.org/jira/browse/HDFS-14186 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: He Xiaoqiao > Assignee: He Xiaoqiao > Priority: Major > Attachments: HDFS-14186.001.patch > > > In the current implementation, the datanode sends blockreport immediately > after register to namenode successfully when restart, and the blockreport > storm will make namenode high load to process them. One result is some > received RPC have to skip because queue time is timeout. If some datanodes' > heartbeat RPC are continually skipped for long times (default is > heartbeatExpireInterval=630s) it will be set DEAD, then datanode has to > re-register and send blockreport again, aggravate blockreport storm and trap > in a vicious circle, and slow down (more than one hour and even more) > namenode startup seriously in a large (several thousands of datanodes) and > busy cluster especially. Although there are many work to optimize namenode > startup, the issue still exists. > I propose to postpone dead datanode check when namenode have finished startup. > Any comments and suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org