[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168635#comment-17168635 ]
Chengwei Wang edited comment on HDFS-15493 at 7/31/20, 11:06 AM: ----------------------------------------------------------------- After reviewed code about update blocks map and name cache carefully,I found that it's feasible to start to do these when started loading INodeSection, and shutdown the executors when completed loading INodeDirectorySection. So that, it taken almost no time cost to wait executor terminated. Submit a patch [^HDFS-15493.004.patch] base on this means. It uses two single thread executors and updates without lock. Tested this patch twice. {code:java} Test1. 20/07/31 18:27:50 INFO namenode.FSImageFormatPBINode: Completed loading all INodeDirectory sub-sections 20/07/31 18:27:50 INFO namenode.FSImageFormatPBINode: Completed update blocks map and name cache, total waiting duration: 1 20/07/31 18:27:51 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 367 seconds. Test2. 20/07/31 18:48:03 INFO namenode.FSImageFormatPBINode: Completed loading all INodeDirectory sub-sections 20/07/31 18:48:03 INFO namenode.FSImageFormatPBINode: Completed update blocks map and name cache, total waiting duration: 1 20/07/31 18:48:04 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 363 seconds.{code} It takes about 20% speed up base my tests and reduces the time cost from 460s+ to 360s+. I think this patch may be the best choice, [~sodonnell] can you help me test it on trunk. was (Author: smarthan): After reviewed code about update blocks map and name cache carefully,I found that it's feasible to start to do these when started loading INodeSection, and shutdown the executors when completed loading INodeDirectorySection. So that, it taken almost no time cost to wait executor terminated. Submit a patch [^HDFS-15493.004.patch] base on this means. It uses two single thread executors and updates without lock. Tested this patch twice. {code:java} Test1. 20/07/31 18:27:50 INFO namenode.FSImageFormatPBINode: Completed loading all INodeDirectory sub-sections 20/07/31 18:27:50 INFO namenode.FSImageFormatPBINode: Completed update blocks map and name cache, total waiting duration: 1 20/07/31 18:27:51 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 367 seconds. Test2. 20/07/31 18:48:03 INFO namenode.FSImageFormatPBINode: Completed loading all INodeDirectory sub-sections 20/07/31 18:48:03 INFO namenode.FSImageFormatPBINode: Completed update blocks map and name cache, total waiting duration: 1 20/07/31 18:48:04 INFO namenode.FSImageFormatProtobuf: Loaded FSImage in 363 seconds.{code} It takes about 20% speed up base my tests and reduces the time cost from 460s+ to 360s+. I think this patch may be the best choice, [~sodonnell] can you help me test it on trunk. > Update block map and name cache in parallel while loading fsimage. > ------------------------------------------------------------------ > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Chengwei Wang > Priority: Major > Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, > HDFS-15493.003.patch, HDFS-15493.004.patch, fsimage-loading.log > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org