fhan688 commented on code in PR #19041:
URL: https://github.com/apache/hudi/pull/19041#discussion_r3445693478
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/versioning/v2/TimelineArchiverV2.java:
##########
@@ -236,7 +237,11 @@ private List<HoodieInstant> getCommitInstantsToArchive()
throws IOException {
config.getCleanerPolicy());
earliestInstantToRetainCandidates.add(earliestInstantToRetainForClustering);
- // 4. If metadata table is enabled, do not archive instants which are more
recent than the last compaction on the
+ // 4. If enabled, block archival based on ECTR from the last completed
clean to ensure we don't archive
+ // commits that have data files that haven't been cleaned yet.
+
earliestInstantToRetainCandidates.add(getEarliestInstantToRetainForClean(table,
completedCommitsTimeline, config));
Review Comment:
> This makes TimelineArchiverV2 (LSM/v9) block on clean ECTR, inverting the
previously-asserted behavior (the old test documented V1-only). Please confirm
V1-only was an oversight rather than a deliberate LSM restriction. Note this
also adds a new failure mode: V2 archival now aborts with `HoodieIOException`
if the clean-metadata read fails (when the config is enabled).
The V1-only behavior looks like an oversight rather than an intentional
LSM/v9 restriction. The config is about preventing archival from crossing the
latest clean `ECTR`, so the safety boundary should be independent of the
archiver version.
For the read failure case, this intentionally keeps the same fail-closed
behavior as V1 when `hoodie.archive.block.on.clean.ectr` is enabled. If the
clean metadata cannot be read, we cannot safely know the archival boundary, so
aborting is preferable to archiving through an unknown clean `ECTR`. The config
remains disabled by default, so this only affects users who explicitly enable
the protection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]