[ 
https://issues.apache.org/jira/browse/HDFS-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866216#comment-16866216
 ] 

He Xiaoqiao commented on HDFS-14576:
------------------------------------

Thanks [~sodonnell] for your comments.
{quote}we used to see a lot of issues like this, but in later CDH versions 
several patches have been backported that made the initial block report problem 
largely disappear.  
{quote}
According to my experience, HDFS-6763 and HDFS-7097 have obvious positive 
effect for restart. Of course, there are some other JIRA to improve this issue. 
It will be helpful if you would like to dig and share some JIRAs for CDH 
version.
{quote}Have you investigated using dfs.blockreport.initialDelay for the 
datanodes? I believe that will cause the datanode to delay its initial block 
report by a random interval between zero and that setting.{quote}
Right, tune `dfs.blockreport.initialDelay` parameter is one way to improve this 
issue, and I also apply this configuration. But I don't think it is the common 
solution for this issue, since the startup time increase always when the 
cluster grows. And we have to follow and tune this parameter.
{quote}what version of HDFS are you running where you see these problems{quote}
branch-2.7, after check trunk, I believe this issue still exist. Thanks again.

> Avoid block report retry and slow down namenode startup
> -------------------------------------------------------
>
>                 Key: HDFS-14576
>                 URL: https://issues.apache.org/jira/browse/HDFS-14576
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>
> During namenode startup, the load will be very high since it has to process 
> every datanodes blockreport one by one. If there are hundreds datanodes block 
> reports pending process, the issue will be more serious even 
> #processFirstBlockReport is processed a lot more efficiently than ordinary 
> block reports. Then some of datanode will retry blockreport and lengthens 
> restart times. I think we should filter the block report request (via 
> datanode blockreport retries) which has be processed and return directly then 
> shorten down restart time. I want to state this proposal may be obvious only 
> for large cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to