[ https://issues.apache.org/jira/browse/HDFS-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14319115#comment-14319115 ]
Haohui Mai edited comment on HDFS-7784 at 2/12/15 10:32 PM: ------------------------------------------------------------ I've done some experiments in HDFS-5698. Parallelism does improve the performance, however, my feeling is that the improvement is *not* significant enough to justify the the complexity, especially having one race / bug here could easily lead to data loss. was (Author: wheat9): I've done some experiments in HDFS-5698. Parallelism does improve the performance, however, my feeling is that the improvement is significant enough to justify the the complexity, especially having one race / bug here could easily lead to data loss. > load fsimage in parallel > ------------------------ > > Key: HDFS-7784 > URL: https://issues.apache.org/jira/browse/HDFS-7784 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Walter Su > Assignee: Walter Su > > When single Namenode has huge amount of files, without using federation, the > startup/restart speed is slow. The fsimage loading step takes the most of the > time. fsimage loading can seperate to two parts, deserialization and object > construction(mostly map insertion). Deserialization takes the most of CPU > time. So we can do deserialization in parallel, and add to hashmap in serial. > It will significantly reduce the NN start time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)