nsivabalan commented on a change in pull request #4519:
URL: https://github.com/apache/hudi/pull/4519#discussion_r781376321



##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieMetadataBootstrap.java
##########
@@ -76,6 +80,36 @@ public void 
testMetadataBootstrapInsertUpsertClean(HoodieTableType tableType) th
     bootstrapAndVerify();
   }
 
+  /**
+   * Validate that bootstrap considers only files part of completed commit and 
ignore any extra files.
+   */
+  @Test
+  public void testMetadataBootstrapWithExtraFiles() throws Exception {
+    HoodieTableType tableType = COPY_ON_WRITE;
+    init(tableType, false);
+    doPreBootstrapWriteOperation(testTable, INSERT, "0000001");
+    doPreBootstrapWriteOperation(testTable, "0000002");
+    doPreBootstrapClean(testTable, "0000003", Arrays.asList("0000001"));
+    doPreBootstrapWriteOperation(testTable, "0000005");
+    // add few extra files to table. bootstrap should include those files.
+    String fileName = UUID.randomUUID().toString();
+    Path baseFilePath = FileCreateUtils.getBaseFilePath(basePath, "p1", 
"0000006", fileName);
+    FileCreateUtils.createBaseFile(basePath, "p1", "0000006", fileName, 100);

Review comment:
       if its part of the timeline, bootstrap may not kick in. also, not sure 
if we will gain much from it. this test fails if not the fix in source code as 
part of this patch. So, we should be good. Let me know what you think. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##########
@@ -746,9 +746,16 @@ protected void bootstrapCommit(List<DirectoryInfo> 
partitionInfoList, String cre
     HoodieData<HoodieRecord> partitionRecords = 
engineContext.parallelize(Arrays.asList(allPartitionRecord), 1);
     if (!partitionInfoList.isEmpty()) {
       HoodieData<HoodieRecord> fileListRecords = 
engineContext.parallelize(partitionInfoList, 
partitionInfoList.size()).map(partitionInfo -> {
+        Map<String, Long> fileNameToSizeMap = 
partitionInfo.getFileNameToSizeMap();
+        // filter for files that are part of the completed commits
+        Map<String, Long> validFileNameToSizeMap = 
fileNameToSizeMap.entrySet().stream().filter(fileSizePair -> {
+          String commitTime = FSUtils.getCommitTime(fileSizePair.getKey());
+          return HoodieTimeline.compareTimestamps(commitTime, 
HoodieTimeline.LESSER_THAN_OR_EQUALS, createInstantTime);

Review comment:
       bootstrap itself will get triggered only if all operations are complete. 
If there was a partially failed commit, unless an explicit rollback happens, 
bootstrap may not kick in. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to