[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174438#comment-17174438
 ] 

Stephen O'Donnell commented on HDFS-15493:
------------------------------------------

[~smarthan]Thanks for the update. I think we are mostly good now. Just 2 more 
things:

1) You have a blank like at line 309 in FSImageFormatPBINode.java:

{code}
    private void addToCacheAndBlockMap(final ArrayList<INode> inodeList) {
>> This line is blank
      final ArrayList<INode> inodes = new ArrayList<>(inodeList);
      nameCacheUpdateExecutor.submit(
{code}

2. I discussed this change with one of my colleagues, and he suggested we 
extend the unit test you added to take some snapshots and rename some files, as 
this will create some inodeReference objects, and hence test that code path 
too. Then we can dump the filesystem tree before and after saving the namespace 
and ensure they are identical. I have adjusted your test to do this:

{code}
  @Test
  public void testUpdateBlocksMapAndNameCacheAsync() throws IOException {
    Configuration conf = new Configuration();
    MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build();
    cluster.waitActive();
    DistributedFileSystem fs = cluster.getFileSystem();
    FSDirectory fsdir = cluster.getNameNode().namesystem.getFSDirectory();
    File workingDir = GenericTestUtils.getTestDir();

    File preRestartTree = new File(workingDir,"preRestartTree");
    File postRestartTree = new File(workingDir,"postRestartTree");

    Path baseDir = new Path("/user/foo");
    fs.mkdirs(baseDir);
    fs.allowSnapshot(baseDir);
    for (int i = 0; i < 5; i++) {
      Path dir = new Path(baseDir, Integer.toString(i));
      fs.mkdirs(dir);
      for (int j = 0; j < 5; j++) {
        Path file = new Path(dir, Integer.toString(j));
        FSDataOutputStream os = fs.create(file);
        os.write((byte) j);
        os.close();
      }
      fs.createSnapshot(baseDir, "snap_"+i);
      fs.rename(new Path(dir, "0"), new Path(dir, "renamed"));
    }
    SnapshotTestHelper.dumpTree2File(fsdir, preRestartTree);

    // checkpoint
    fs.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
    fs.saveNamespace();
    fs.setSafeMode(SafeModeAction.SAFEMODE_LEAVE);

    cluster.restartNameNode();
    cluster.waitActive();
    fs = cluster.getFileSystem();
    fsdir = cluster.getNameNode().namesystem.getFSDirectory();

    // Ensure all the files created above exist, and blocks is correct.
    for (int i = 0; i < 5; i++) {
      Path dir = new Path(baseDir, Integer.toString(i));
      assertTrue(fs.getFileStatus(dir).isDirectory());
      for (int j = 0; j < 5; j++) {
        Path file = new Path(dir, Integer.toString(j));
        if (j == 0) {
          file = new Path(dir, "renamed");
        }
        FSDataInputStream in = fs.open(file);
        int n = in.readByte();
        assertEquals(j, n);
        in.close();
      }
    }
    SnapshotTestHelper.dumpTree2File(fsdir, postRestartTree);
    SnapshotTestHelper.compareDumpedTreeInFile(
        preRestartTree, postRestartTree, true);
  }
{code}

If you could fix the blank line add in the above unit test I am +1 to commit 
this. Thanks for all your work on this.

> Update block map and name cache in parallel while loading fsimage.
> ------------------------------------------------------------------
>
>                 Key: HDFS-15493
>                 URL: https://issues.apache.org/jira/browse/HDFS-15493
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Chengwei Wang
>            Priority: Major
>         Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, 
> HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, 
> HDFS-15493.006.patch, fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to