prashantwason commented on a change in pull request #3873:
URL: https://github.com/apache/hudi/pull/3873#discussion_r743232434



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -645,4 +612,83 @@ protected void doClean(AbstractHoodieWriteClient 
writeClient, String instantTime
     // metadata table.
     writeClient.clean(instantTime + "002");
   }
+
+  /**
+   * Commit the {@code HoodieRecord}s to Metadata Table as a new delta-commit.
+   *
+   */
+  protected abstract void commit(List<HoodieRecord> records, String 
partitionName, String instantTime);
+
+  /**
+   * Commit the partition to file listing information to Metadata Table as a 
new delta-commit.
+   *
+   */
+  protected abstract void commit(List<DirectoryInfo> dirInfoList, String 
createInstantTime);
+
+
+  /**
+   * A class which represents a directory and the files and directories inside 
it.
+   *
+   * A {@code PartitionFileInfo} object saves the name of the partition and 
various properties requires of each file
+   * required for bootstrapping the metadata table. Saving limited properties 
reduces the total memory footprint when
+   * a very large number of files are present in the dataset being 
bootstrapped.
+   */
+  public static class DirectoryInfo implements Serializable {
+    // Relative path of the directory (relative to the base directory)
+    private String relativePath;
+    // List of filenames within this partition
+    private List<String> filenames;
+    // Length of the various files
+    private List<Long> filelengths;
+    // List of directories within this partition
+    private List<Path> subdirs = new ArrayList<>();
+    // Is this a HUDI partition
+    private boolean isPartition = false;
+
+    public DirectoryInfo(String relativePath, FileStatus[] fileStatus) {
+      this.relativePath = relativePath;
+
+      // Pre-allocate with the maximum length possible
+      filenames = new ArrayList<>(fileStatus.length);
+      filelengths = new ArrayList<>(fileStatus.length);
+
+      for (FileStatus status : fileStatus) {
+        if (status.isDirectory()) {
+          this.subdirs.add(status.getPath());
+        } else if 
(status.getPath().getName().equals(HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE))
 {
+          // Presence of partition meta file implies this is a HUDI partition
+          this.isPartition = true;
+        } else if (FSUtils.isDataFile(status.getPath())) {
+          // Regular HUDI data file (base file or log file)
+          filenames.add(status.getPath().getName());
+          filelengths.add(status.getLen());
+        }
+      }
+    }
+
+    public String getRelativePath() {
+      return relativePath;
+    }
+
+    public int getTotalFiles() {
+      return filenames.size();
+    }
+
+    public boolean isPartition() {
+      return isPartition;
+    }
+
+    public List<Path> getSubdirs() {
+      return subdirs;
+    }
+
+    // Returns a map of filenames mapped to their lengths
+    public Map<String, Long> getFileMap() {

Review comment:
       Done. A good simplification indeed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to