[GitHub] [hadoop] sodonnel commented on a change in pull request #2918: HDFS-15987. Improve oiv tool to parse fsimage file in parallel with d…

GitBox Tue, 20 Apr 2021 08:54:19 -0700


sodonnel commented on a change in pull request #2918:
URL: https://github.com/apache/hadoop/pull/2918#discussion_r616821726




##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageTextWriter.java
##########
@@ -649,14 +679,123 @@ private void output(Configuration conf, FileSummary 
summary,
         is = FSImageUtil.wrapInputStreamForCompression(conf,
             summary.getCodec(), new BufferedInputStream(new LimitInputStream(
                 fin, section.getLength())));
-        outputINodes(is);
+        INodeSection s = INodeSection.parseDelimitedFrom(is);
+        LOG.info("Found {} INodes in the INode section", s.getNumInodes());
+        int count = outputINodes(is, out);
+        LOG.info("Outputted {} INodes.", count);
       }
     }
     afterOutput();
     long timeTaken = Time.monotonicNow() - startTime;
     LOG.debug("Time to output inodes: {}ms", timeTaken);
   }
 
+  /**
+   * STEP1: Multi-threaded process sub-sections
+   * Given n (1<n<=k) threads to process k sections,
+   * E.g. 10 sections and 4 threads, grouped as follows:
+   * |---------------------------------------------------------------|
+   * | (0    1    2)    (3    4    5)    (6    7)     (8    9)       |
+   * | thread[0]        thread[1]        thread[2]    thread[3]      |
+   * |---------------------------------------------------------------|
+   *
+   * STEP2: Merge files.
+   */
+  private void outputInParallel(Configuration conf, FileSummary summary,

Review comment:
       I think its a good idea to make this change in stages. If using just the 
parallel part of the INODE sections cuts the runtime in half, then its a good 
change. We can always open more Jiras to parallel the other sections after this 
one is done, if it makes sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[GitHub] [hadoop] sodonnel commented on a change in pull request #2918: HDFS-15987. Improve oiv tool to parse fsimage file in parallel with d…

Reply via email to