hudi-agent commented on code in PR #19041:
URL: https://github.com/apache/hudi/pull/19041#discussion_r3445495234
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/utils/ArchivalUtils.java:
##########
@@ -110,6 +113,37 @@ public static Pair<Integer,Integer>
getMinAndMaxInstantsToKeep(HoodieTable<?, ?,
return Pair.of(minInstantsToKeep, maxInstantsToKeep);
}
+ public static Option<HoodieInstant> getEarliestInstantToRetainForClean(
+ HoodieTable<?, ?, ?, ?> table,
+ HoodieTimeline commitTimeline,
+ HoodieWriteConfig config) {
+ if (!config.shouldBlockArchivalOnCleanECTR()) {
+ return Option.empty();
+ }
+
+ Option<HoodieInstant> lastCleanInstant =
table.getCleanTimeline().filterCompletedInstants().lastInstant();
+ if (!lastCleanInstant.isPresent()) {
+ return Option.empty();
+ }
+
+ try {
+ HoodieCleanMetadata cleanMetadata =
table.getActiveTimeline().readCleanMetadata(lastCleanInstant.get());
+ String earliestCommitToRetain =
cleanMetadata.getEarliestCommitToRetain();
+ if (earliestCommitToRetain == null ||
earliestCommitToRetain.trim().isEmpty()) {
+ return Option.empty();
+ }
+
+ Option<HoodieInstant> earliestInstantToRetain =
+
commitTimeline.findInstantsAfterOrEquals(earliestCommitToRetain).firstInstant();
+ log.info("Blocking archival based on earliest commit to retain {} from
last clean {}. Earliest instant to retain is {}",
+ earliestCommitToRetain, lastCleanInstant.get().requestedTime(),
earliestInstantToRetain.map(instant -> instant).orElse(null));
Review Comment:
🤖 nit: `.map(instant -> instant)` is a no-op identity transform — could you
drop it and use `earliestInstantToRetain.orElse(null)` directly?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java:
##########
@@ -1059,6 +1061,20 @@ public class FlinkOptions extends HoodieConfig {
.withFallbackKeys("hoodie.clean.fileversions.retained")
.withDescription("Number of file versions to retain. default 5");
+ public static final ConfigOption<Long> CLEAN_MAX_COMMITS_TO_CLEAN =
ConfigOptions
Review Comment:
🤖 nit: the `_TO_CLEAN` suffix is redundant here — every other option in this
class uses the pattern `CLEAN_<descriptor>` (e.g. `CLEAN_RETAIN_COMMITS`,
`CLEAN_RETAIN_HOURS`) without repeating the namespace. Could you rename this to
`CLEAN_MAX_COMMITS` to stay consistent?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-common/src/main/java/org/apache/hudi/common/util/CleanerUtils.java:
##########
@@ -151,8 +152,16 @@ public static Option<HoodieInstant>
getEarliestCommitToRetain(
} else if (cleaningPolicy == HoodieCleaningPolicy.KEEP_LATEST_BY_HOURS) {
ZonedDateTime latestDateTime = ZonedDateTime.ofInstant(latestInstant,
timeZone.getZoneId());
String earliestTimeToRetain =
TimelineUtils.formatDate(Date.from(latestDateTime.minusHours(hoursRetained).toInstant()));
- earliestCommitToRetain =
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream().filter(i
-> compareTimestamps(i.requestedTime(),
- GREATER_THAN_OR_EQUALS, earliestTimeToRetain)).findFirst());
+ earliestCommitToRetain =
Option.fromJavaOptional(completedCommitsTimeline.getInstantsAsStream()
+ .filter(i -> compareTimestamps(i.requestedTime(),
GREATER_THAN_OR_EQUALS, earliestTimeToRetain))
+ .findFirst());
+
+ Option<HoodieInstant> earliestPendingCommit = commitsTimeline.filter(s
-> !s.isCompleted()).firstInstant();
Review Comment:
🤖 Could you confirm the intent when `earliestCommitToRetain` is initially
empty (no completed commit at-or-after the cutoff) but a pending instant
exists? With this change, ECTR gets set to the last completed before the
pending — typically the latest completed — so a quiet table with one in-flight
commit would flip from "no cleaning" to "clean everything older than the latest
completed". The parallel `KEEP_LATEST_COMMITS` branch avoids this because its
pending check is nested under `countInstants > commitsRetained`. Was this
asymmetry intended, or should the override only apply when the initial ECTR was
already present?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]