shangxinli opened a new pull request, #18778: URL: https://github.com/apache/hudi/pull/18778
### Describe the issue this Pull Request addresses Part of the freshness-tracking work discussed in #17512. This PR implements **Phase 1** of the reconcile plan: expose the per-partition event-time rollup that is already latent on disk, and stop gating watermark tracking on `EVENT_TIME_ORDERING` so freshness observability works for COW / `COMMIT_TIME_ORDERING` tables too. This is purely additive — no commit-metadata key added, no avro schema change, no behavior change for tables that have not opted into `hoodie.write.track.event.time.watermark`. ### Summary and Changelog Today `WriteStatus.markSuccess()` already folds min/max event time into each `HoodieWriteStat` (and the avro schema already serializes them per stat alongside `partitionPath`). But the only public accessor on `HoodieCommitMetadata` is `getMinAndMaxEventTime()`, which collapses every partition into a single pair — consumers asking *"how fresh is partition dt=2026-05-19?"* have to walk `partitionToWriteStats` themselves. Watermark tracking is also currently gated on `recordMergeMode == EVENT_TIME_ORDERING`, even though freshness observability is independent of merge semantics. The result is that COW tables with `COMMIT_TIME_ORDERING` silently get no watermark even when the user explicitly opts in. This PR: - **Adds `HoodieCommitMetadata.getMinAndMaxEventTimePerPartition()`** — a pure aggregation over `partitionToWriteStats` that returns `Map<String, Pair<Option<Long>, Option<Long>>>`. Partitions whose stats carry no event time at all are omitted (so the map size reflects partitions with freshness data, not total partitions written). Min/max within a partition are folded with `Math.min` / `Math.max`, mirroring the semantics of the existing global getter. No persisted bytes, no avro change. - **Decouples watermark tracking from `EVENT_TIME_ORDERING`** in `HoodieWriteHandle`. Tracking now activates when `eventTimeFieldName != null && hoodie.write.track.event.time.watermark=true`, regardless of merge mode. The unused `EVENT_TIME_ORDERING` static import is removed. - **Tests:** five new unit tests for the rollup API in `TestHoodieCommitMetadata` (folding across stats within a partition, omitting partitions without event time, handling partial min/max, empty metadata, and a consistency check against the global getter); updates the existing `testShouldTrackEventTimeWaterMarkerAvroRecordTypeWithCommitTimeOrdering` to assert the new behavior (now tracks) and adds a negative test for the missing-event-time-field case. Full hudi-common (1897 tests) and hudi-client-common (1026 tests) suites pass locally. ### Impact Public-API addition on `HoodieCommitMetadata`: external tools (catalogs, freshness exporters, lineage UIs) can now read per-partition freshness directly without walking write stats. Behavior change for opted-in tables: COW / `COMMIT_TIME_ORDERING` tables with `hoodie.write.track.event.time.watermark=true` and an event-time field will now populate min/max on write stats; previously they were silently no-op. Tables that have *not* set the flag see no change. No performance impact — the rollup is a pure in-memory aggregation that callers invoke on demand; watermark extraction at write time was already gated on the same per-record path. ### Risk Level low The new method is additive. The behavior change is conditional on a config that is `false` by default and gated on an event-time field name; tables not using the flag are unaffected. Verified by running the full `hudi-common` and `hudi-client-common` test suites locally with no regressions. ### Documentation Update The `hoodie.write.track.event.time.watermark` config description should be updated on the Hudi website to reflect that it no longer requires `EVENT_TIME_ORDERING`. The new `getMinAndMaxEventTimePerPartition()` API is internally documented via Javadoc; a website page covering per-partition freshness consumption can land alongside Phase 2 (upstream propagation) so users see the end-to-end story in one place. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
