[jira] [Comment Edited] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

Chengwei Wang (Jira) Tue, 28 Jul 2020 19:57:23 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166818#comment-17166818
 ]


Chengwei Wang edited comment on HDFS-15493 at 7/29/20, 2:56 AM:
----------------------------------------------------------------

Hi [~sodonnell], thanks for your detailed review and testing.
{quote}When you tested, are you sure the parallel loading in HDFS-14617 was 
enabled correctly, by first saving the image to create the sub-sections in the 
image index? If it is working correctly, you should see log messages like:
{quote}
I'm sure that the parallel loading was eabled correctly, and I had tested again 
yesterday as your test suggestions, and submit a summary log 
here.[^fsimage-loading.log]   

In my tests, (240M inode + 220M blcoks)  when update blocks async enabled, the 
time cost of loading fsimage reduce from 467s to 420s. So, I guess if the scale 
of fsimage make the loading improment not obvious.
{quote}It would be very interesting to check the performance of my earlier 
suggestion with two single threaded executors and see how it performs.
{quote}
I had tested loading the caches and blocks by two single thread executors, same 
to your test result, there would be a long time to wait the executors 
terminated, so the time cost was not better than the one executor with four 
threads.
{quote}If we could move the executor shutdown to the end of image loading, 
rather than wait on it, we would see a good improvement in the parallel case 
too. However, I am not sure if that is a safe thing to do - other sections may 
depend on the block map / cache being loaded fully when the inode directory 
section has completed.
{quote}
I agree this idea is a better way, I will try to check if it is  safe and give 
a test result.

By the way, I will refactor some code as your suggestions, and submit a patch 
soon.
  

 


was (Author: smarthan):
Hi [~sodonnell], thanks for your detailed review and testing.
{quote}When you tested, are you sure the parallel loading in HDFS-14617 was 
enabled correctly, by first saving the image to create the sub-sections in the 
image index? If it is working correctly, you should see log messages like:
{quote}
I'm sure that the parallel loading was eabled correctly, and I had tested again 
yesterday as your test suggestions, and submit a summary log 
here.[^fsimage-loading.log]   

In my tests, (240M inode + 220M blcoks)  when update blocks async enabled, the 
time cost of loading fsimage reduce from 467s to 420s. So, I guess if the scale 
of fsimage make the loading improment not obvious.

 
{quote}It would be very interesting to check the performance of my earlier 
suggestion with two single threaded executors and see how it performs.
{quote}
I had tested loading the caches and blocks by two single thread executors, same 
to your test result, there would be a long time to wait the executors 
terminated, so the time cost was not better than the one executor with four 
threads.

 
{quote}If we could move the executor shutdown to the end of image loading, 
rather than wait on it, we would see a good improvement in the parallel case 
too. However, I am not sure if that is a safe thing to do - other sections may 
depend on the block map / cache being loaded fully when the inode directory 
section has completed.
{quote}
I agree this idea is a better way, I will try to check if it is  safe and give 
a test result.

By the way, I will refactor some code as your suggestions, and submit a patch 
soon.
  

 

 

 

 

> Update block map and name cache in parallel while loading fsimage.
> ------------------------------------------------------------------
>
>                 Key: HDFS-15493
>                 URL: https://issues.apache.org/jira/browse/HDFS-15493
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Chengwei Wang
>            Priority: Major
>         Attachments: HDFS-15493.001.patch, fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

Reply via email to