[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165980#comment-17165980 ]
Stephen O'Donnell commented on HDFS-15493: ------------------------------------------ I tested this change with at image of 9GB, 86M inodes and 74M blocks. My load time with parallel loading off and this new async loading off, is about 384 seconds. Turning on only the new async block map loading, the load time is reduced to about 337 seconds. With parallel loading on - 4 threads and 12 sub-sections, any the async block map off, the load time is about 236 seconds. Finally turning on parallel loading and async block map, the load time increased to about 245 seconds. Therefore on my tests, this change slows down the parallel load slightly, but it does provide about 13% speed up with serial loading. When you tested, are you sure the parallel loading in HDFS-14617 was enabled correctly, by first saving the image to create the sub-sections in the image index? If it is working correctly, you should see log messages like: {code} 2020-07-27 20:21:06,566 INFO namenode.FSImageFormatProtobuf: The fsimage will be loaded in parallel using 4 threads 2020-07-27 20:21:06,611 INFO namenode.FSImageFormatPBINode: Loading the INode section in parallel with 12 sub-sections 2020-07-27 20:21:06,613 INFO namenode.FSImageFormatPBINode: Loading 86398618 INodes. 2020-07-27 20:21:10,855 INFO util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3674ms GC pool 'ParNew' had collection(s): count=1 time=4150ms 2020-07-27 20:22:49,827 INFO namenode.FSImageFormatPBINode: Completed loading all INode sections. Loaded 86398618 inodes. 2020-07-27 20:22:51,141 INFO namenode.FSImageFormatPBINode: Loading the INodeDirectory section in parallel with 12 sub-sections 2020-07-27 20:23:23,373 INFO namenode.FSImageFormatPBINode: Completed loading all INodeDirectory sub-sections {code} It would be very interesting to check the performance of my earlier suggestion with two single threaded executors and see how it performs. > Update block map and name cache in parallel while loading fsimage. > ------------------------------------------------------------------ > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Chengwei Wang > Priority: Major > Attachments: HDFS-15493.001.patch > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduc to 410s. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org