fhan688 commented on code in PR #19041:
URL: https://github.com/apache/hudi/pull/19041#discussion_r3445693478


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java:
##########
@@ -236,7 +237,11 @@ private List<HoodieInstant> getCommitInstantsToArchive() 
throws IOException {
             config.getCleanerPolicy());
     
earliestInstantToRetainCandidates.add(earliestInstantToRetainForClustering);
 
-    // 4. If metadata table is enabled, do not archive instants which are more 
recent than the last compaction on the
+    // 4. If enabled, block archival based on ECTR from the last completed 
clean to ensure we don't archive
+    // commits that have data files that haven't been cleaned yet.
+    
earliestInstantToRetainCandidates.add(getEarliestInstantToRetainForClean(table, 
completedCommitsTimeline, config));

Review Comment:
   > This makes TimelineArchiverV2 (LSM/v9) block on clean ECTR, inverting the 
previously-asserted behavior (the old test documented V1-only). Please confirm 
V1-only was an oversight rather than a deliberate LSM restriction. Note this 
also adds a new failure mode: V2 archival now aborts with `HoodieIOException` 
if the clean-metadata read fails (when the config is enabled).
   
   The V1-only behavior looks like an oversight rather than an intentional 
LSM/v9 restriction. The config is about preventing archival from crossing the 
latest clean `ECTR`, so the safety boundary should be independent of the 
archiver version.
   
     For the read failure case, this intentionally keeps the same fail-closed 
behavior as V1 when `hoodie.archive.block.on.clean.ectr` is enabled. If the 
clean metadata cannot be read, we cannot safely know the archival boundary, so 
aborting is preferable to archiving through an unknown clean `ECTR`. The config 
remains disabled by default, so this only affects users who explicitly enable 
the protection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to