danny0405 commented on code in PR #9871:
URL: https://github.com/apache/hudi/pull/9871#discussion_r1379469734


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/HoodieTimelineArchiver.java:
##########
@@ -270,13 +270,41 @@ private List<HoodieInstant> getCommitInstantsToArchive() 
throws IOException {
             // stop at first savepoint commit
             return !firstSavepoint.isPresent() || 
compareTimestamps(s.getTimestamp(), LESSER_THAN, 
firstSavepoint.get().getTimestamp());
           }
-        }).filter(s -> earliestInstantToRetain
-            .map(instant -> compareTimestamps(s.getTimestamp(), LESSER_THAN, 
instant.getTimestamp()))
+        })
+        .filter(s -> earliestInstantToRetain.map(
+            instant -> compareTimestamps(s.getTimestamp(), LESSER_THAN, 
instant.getTimestamp())
+                // for the compaction c2 in metadata table triggered by commit 
c1 in data table,
+                // c2.getTimestamp() > c1.getTimestamp() because: c2 happens 
before c1 completes,
+                // and we are generating new instant time for c2 after c1 has 
started. Effectively,

Review Comment:
   > Lines 208-222 is fine as it is trying to get the earliest candidate 
instant to retain. The main filtering happens after all such candidate instants 
have been collected. I think the fix should be where the filtering happens.
   
   Still think line 208 ~ 222 is the right place to fix. It actually does the 
similar thing before by filtering the instants with MDT compaction instant, why 
not just fix it by using the completion time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to