[ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890285#comment-16890285
 ] 

Stephen O'Donnell commented on HDFS-13693:
------------------------------------------

I think this performance improvement is a great discovery, but the change does 
carry some future risk, in that if something changes in how the image is loaded 
it would be easy to miss this optimization. However, most changes involve some 
risk and this does give a decent speed improvement so its probably worth it.

I tried this change in my testing around loading the fsimage in parallel in 
HDFS-14617. I found that in the single threaded case, the load time was 
improved by about 35 seconds (326 to 291 seconds for just the directory section 
load time), but when I moved to parallel loading (4 threads), this change had 
negligible impact. Probably because the work was spread out over more threads 
and there are other points of serialization that slow things down.

I am happy for this to go in but thought it was worth highlighting the above.

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -------------------------------------------------------------------------
>
>                 Key: HDFS-13693
>                 URL: https://issues.apache.org/jira/browse/HDFS-13693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: zhouyingchao
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to