[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165980#comment-17165980
 ] 

Stephen O'Donnell commented on HDFS-15493:
------------------------------------------

I tested this change with at image of 9GB, 86M inodes and 74M blocks.

My load time with parallel loading off and this new async loading off, is about 
384 seconds.

Turning on only the new async block map loading, the load time is reduced to 
about 337 seconds.

With parallel loading on - 4 threads and 12 sub-sections, any the async block 
map off, the load time is about 236 seconds.

Finally turning on parallel loading and async block map, the load time 
increased to about 245 seconds.

Therefore on my tests, this change slows down the parallel load slightly, but 
it does provide about 13% speed up with serial loading.

When you tested, are you sure the parallel loading in HDFS-14617 was enabled 
correctly, by first saving the image to create the sub-sections in the image 
index? If it is working correctly, you should see log messages like:

{code}
2020-07-27 20:21:06,566 INFO namenode.FSImageFormatProtobuf: The fsimage will 
be loaded in parallel using 4 threads
2020-07-27 20:21:06,611 INFO namenode.FSImageFormatPBINode: Loading the INode 
section in parallel with 12 sub-sections
2020-07-27 20:21:06,613 INFO namenode.FSImageFormatPBINode: Loading 86398618 
INodes.
2020-07-27 20:21:10,855 INFO util.JvmPauseMonitor: Detected pause in JVM or 
host machine (eg GC): pause of approximately 3674ms
GC pool 'ParNew' had collection(s): count=1 time=4150ms
2020-07-27 20:22:49,827 INFO namenode.FSImageFormatPBINode: Completed loading 
all INode sections. Loaded 86398618 inodes.
2020-07-27 20:22:51,141 INFO namenode.FSImageFormatPBINode: Loading the 
INodeDirectory section in parallel with 12 sub-sections
2020-07-27 20:23:23,373 INFO namenode.FSImageFormatPBINode: Completed loading 
all INodeDirectory sub-sections
{code}

It would be very interesting to check the performance of my earlier suggestion 
with two single threaded executors and see how it performs.

> Update block map and name cache in parallel while loading fsimage.
> ------------------------------------------------------------------
>
>                 Key: HDFS-15493
>                 URL: https://issues.apache.org/jira/browse/HDFS-15493
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Chengwei Wang
>            Priority: Major
>         Attachments: HDFS-15493.001.patch
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to