[ 
https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325162#comment-14325162
 ] 

Colin Patrick McCabe commented on HDFS-7784:
--------------------------------------------

At the end of the day, there are situations where you have to restart both 
NameNodes.  For example, you might have hit a bug that causes both the standby 
and the active to crash.  We've had bugs like that in the past.  So I do think 
this is an important improvement.

I think the discussion here has been a little too dismissive.  Some people are 
regularly spending 10 minutes to load their big fsimages... I don't think those 
people would write off a 2x (or 2.5x speedup) as "not good enough."

I do think [~wheat9]'s point about avoiding complexity is good.  Can we get 
some benefit just doing a really large amount of readahead?   For example, if 
we had a background thread that ran concurrently, that simply did nothing but 
read the FSImage from start to back, it would "warm up the buffer cache" for 
the other thread.  This would mean that our single-threaded loading process 
would spend less time waiting for disk I/O.  Maybe try that out and see what 
the numbers look like on a really big fsimage (something like 5-7 GB).

> load fsimage in parallel
> ------------------------
>
>                 Key: HDFS-7784
>                 URL: https://issues.apache.org/jira/browse/HDFS-7784
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Walter Su
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-7784.001.patch, test-20150213.pdf
>
>
> When single Namenode has huge amount of files, without using federation, the 
> startup/restart speed is slow. The fsimage loading step takes the most of the 
> time. fsimage loading can seperate to two parts, deserialization and object 
> construction(mostly map insertion). Deserialization takes the most of CPU 
> time. So we can do deserialization in parallel, and add to hashmap in serial. 
>  It will significantly reduce the NN start time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to