hudi-agent commented on code in PR #18843:
URL: https://github.com/apache/hudi/pull/18843#discussion_r3305185160


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -672,17 +704,29 @@ private void bufferDelete(HoodieRecord<T> hoodieRecord) {
 
   /**
    * Checks if the number of records have reached the set threshold and then 
flushes the records to disk.
+   *
+   * <p>{@code bufferedRecord} is the record that was just appended to {@link 
#recordList} by
+   * {@link #writeToBuffer} (or {@code null} for delete/ignored windows where 
{@code recordList}
+   * did not grow). Sizing this object — rather than the incoming pre-{@code 
prepareRecord}
+   * record — keeps {@link #averageRecordSize} aligned with what is actually 
retained in heap.
+   * The incoming record's payload is typically still compact/deflated; the 
buffered record
+   * holds the fully-materialized Avro {@code IndexedRecord} with prepended 
meta-fields, which
+   * is what {@code maxBlockSize} is meant to bound.
    */
-  protected void flushToDiskIfRequired(HoodieRecord record, boolean 
appendDeleteBlocks) {
-    if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize)
-        || numberOfRecords % NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE == 0) {
-      averageRecordSize = (long) (averageRecordSize * 0.8 + 
sizeEstimator.sizeEstimate(record) * 0.2);
+  protected void flushToDiskIfRequired(HoodieRecord bufferedRecord, boolean 
appendDeleteBlocks) {

Review Comment:
   🤖 nit: `Math.max(averageRecordSize, 1)` is redundant here — the preceding 
`averageRecordSize == 0` short-circuit already guarantees `averageRecordSize > 
0` by the time this division runs. Could you drop the `Math.max` to keep the 
gate condition easier to read?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -467,8 +481,8 @@ public void doAppend() {
     while (recordItr.hasNext()) {
       HoodieRecord record = recordItr.next();
       init(record);
-      flushToDiskIfRequired(record, false);
-      writeToBuffer(record);
+      HoodieRecord bufferedRecord = writeToBuffer(record);
+      flushToDiskIfRequired(bufferedRecord, false);
     }
     appendDataAndDeleteBlocks(header, true);
     estimatedNumberOfBytesWritten += averageRecordSize * numberOfRecords;

Review Comment:
   🤖 [Line 491] For a delete-only handle, `averageRecordSize` stays 0 
throughout, so this tail update accumulates `+= 0 * numberOfRecords`. 
`canWrite()` reads `estimatedNumberOfBytesWritten` to decide log-file rolling, 
so it would now stay `true` indefinitely on delete-only handles (whereas the 
previous `init()` seed gave it some non-zero growth). Is that intentional, or 
worth seeding `averageRecordSize` from a delete-shaped sample as a fallback? 
Related to @danny0405's flush comment but distinct (rolling vs intermediate 
flushing).
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -691,6 +735,32 @@ protected void flushToDiskIfRequired(HoodieRecord record, 
boolean appendDeleteBl
     }
   }
 
+  @VisibleForTesting
+  void setSizeEstimator(SizeEstimator<HoodieRecord> sizeEstimator) {
+    this.sizeEstimator = sizeEstimator;
+  }
+
+  @VisibleForTesting
+  long getAverageRecordSize() {
+    return averageRecordSize;
+  }
+
+  @VisibleForTesting
+  long getNumberOfRecordsForTest() {
+    return numberOfRecords;
+  }
+
+  @VisibleForTesting

Review Comment:
   🤖 nit: the `ForTest` suffix is inconsistent across the new 
@VisibleForTesting accessors — `getAverageRecordSize()` omits it while 
`getNumberOfRecordsForTest()` and `getEstimatedBytesWrittenForTest()` carry it. 
Could you drop the suffix on all three (the annotation already conveys intent) 
so they read uniformly?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to