hudi-agent commented on code in PR #18778:
URL: https://github.com/apache/hudi/pull/18778#discussion_r3271078715


##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java:
##########
@@ -497,6 +497,42 @@ public Pair<Option<Long>, Option<Long>> 
getMinAndMaxEventTime() {
         maxEventTime == Long.MIN_VALUE ? Option.empty() : 
Option.of(maxEventTime));
   }
 
+  /**
+   * Returns per-partition min/max event time folded across this commit's 
write stats.
+   *
+   * <p>Each entry corresponds to a partition for which at least one write 
stat carried a
+   * non-null {@link HoodieWriteStat#getMinEventTime()} or {@link 
HoodieWriteStat#getMaxEventTime()}.
+   * Partitions whose write stats have no event-time information at all are 
omitted from the
+   * returned map. The min/max within a partition are folded with {@code 
Math.min} / {@code Math.max}
+   * over its write stats, mirroring the semantics of {@link 
#getMinAndMaxEventTime()}.
+   *
+   * <p>This is a pure aggregation over {@code partitionToWriteStats} — it 
adds no persisted
+   * bytes and does not change the commit avro schema.

Review Comment:
   🤖 nit: "does not change the commit avro schema" reads more like a PR 
justification than an API contract — a future reader may wonder why a read-only 
aggregation method needs to promise schema stability, and the note could become 
misleading if per-partition event times are ever persisted. Have you considered 
dropping this sentence (or moving it to a code comment inside the method 
instead)?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java:
##########
@@ -497,6 +497,42 @@ public Pair<Option<Long>, Option<Long>> 
getMinAndMaxEventTime() {
         maxEventTime == Long.MIN_VALUE ? Option.empty() : 
Option.of(maxEventTime));
   }
 
+  /**
+   * Returns per-partition min/max event time folded across this commit's 
write stats.
+   *
+   * <p>Each entry corresponds to a partition for which at least one write 
stat carried a
+   * non-null {@link HoodieWriteStat#getMinEventTime()} or {@link 
HoodieWriteStat#getMaxEventTime()}.
+   * Partitions whose write stats have no event-time information at all are 
omitted from the
+   * returned map. The min/max within a partition are folded with {@code 
Math.min} / {@code Math.max}
+   * over its write stats, mirroring the semantics of {@link 
#getMinAndMaxEventTime()}.
+   *
+   * <p>This is a pure aggregation over {@code partitionToWriteStats} — it 
adds no persisted
+   * bytes and does not change the commit avro schema.
+   */
+  public Map<String, Pair<Option<Long>, Option<Long>>> 
getMinAndMaxEventTimePerPartition() {
+    Map<String, Pair<Option<Long>, Option<Long>>> result = new HashMap<>();
+    for (Map.Entry<String, List<HoodieWriteStat>> entry : 
partitionToWriteStats.entrySet()) {

Review Comment:
   🤖 nit: the sentinel-value fold (null-check → `Math.min`/`Math.max` → ternary 
back to `Option`) is essentially the same inner loop as in 
`getMinAndMaxEventTime()`. Could you extract a private 
`foldEventTimes(List<HoodieWriteStat>)` helper that both methods share? That 
way any future change to null-handling or sentinel semantics only needs to land 
in one place.
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to