[ 
https://issues.apache.org/jira/browse/HDFS-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887652#comment-16887652
 ] 

He Xiaoqiao commented on HDFS-14576:
------------------------------------

[~zhangchen], Thanks for your involve in. 
I have optimized for restarting namenode times and get the good result.
1. Backport patch include HDFS-6763, HDFS-7097,HDFS-7980, HDFS-7503 and other 
patches.
2. Turn off safemode auto leave. ref: HDFS-14186 for detailed discussion.
3. Force to split block report for different disks.
4. some other nit changes.
However I don't think it is target. Some optimization only in my thought,
1. Optimizing safemode leave mechanism, HDFS-14559, and [demo 
patch|https://issues.apache.org/jira/secure/attachment/12954188/HDFS-14186.001.patch],
2. Avoid useless block report retry to reduce load of NameNode when restart 
since in my own experience, there are almost 30% block report ops retry from 
datanode, but it has processed in NameNode view.
3. More fine-grained block than single disk of DataNode.
4. Improve efficiency of process block report RPC request when startup. (not 
discard timeout RPC, increase queue capacity, namenode trigger to failed report 
retry, etc.)
I am sure it is not the best and complete solution, and welcome to more 
discussion.

> Avoid block report retry and slow down namenode startup
> -------------------------------------------------------
>
>                 Key: HDFS-14576
>                 URL: https://issues.apache.org/jira/browse/HDFS-14576
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>
> During namenode startup, the load will be very high since it has to process 
> every datanodes blockreport one by one. If there are hundreds datanodes block 
> reports pending process, the issue will be more serious even 
> #processFirstBlockReport is processed a lot more efficiently than ordinary 
> block reports. Then some of datanode will retry blockreport and lengthens 
> restart times. I think we should filter the block report request (via 
> datanode blockreport retries) which has be processed and return directly then 
> shorten down restart time. I want to state this proposal may be obvious only 
> for large cluster.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to