[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325162#comment-14325162 ]
Colin Patrick McCabe commented on HDFS-7784: -------------------------------------------- At the end of the day, there are situations where you have to restart both NameNodes. For example, you might have hit a bug that causes both the standby and the active to crash. We've had bugs like that in the past. So I do think this is an important improvement. I think the discussion here has been a little too dismissive. Some people are regularly spending 10 minutes to load their big fsimages... I don't think those people would write off a 2x (or 2.5x speedup) as "not good enough." I do think [~wheat9]'s point about avoiding complexity is good. Can we get some benefit just doing a really large amount of readahead? For example, if we had a background thread that ran concurrently, that simply did nothing but read the FSImage from start to back, it would "warm up the buffer cache" for the other thread. This would mean that our single-threaded loading process would spend less time waiting for disk I/O. Maybe try that out and see what the numbers look like on a really big fsimage (something like 5-7 GB). > load fsimage in parallel > ------------------------ > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Walter Su > Assignee: Walter Su > Priority: Minor > Attachments: HDFS-7784.001.patch, test-20150213.pdf > > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of CPU > time. So we can do deserialization in parallel, and add to hashmap in serial. > It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)