hudi-agent commented on code in PR #18843:
URL: https://github.com/apache/hudi/pull/18843#discussion_r3308218533
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -672,17 +699,29 @@ private void bufferDelete(HoodieRecord<T> hoodieRecord) {
/**
* Checks if the number of records have reached the set threshold and then
flushes the records to disk.
+ *
+ * <p>{@code bufferedRecord} is the record that was just appended to {@link
#recordList} by
+ * {@link #writeToBuffer} (or {@code null} for delete/ignored windows where
{@code recordList}
+ * did not grow). Sizing this object — rather than the incoming pre-{@code
prepareRecord}
+ * record — keeps {@link #averageRecordSize} aligned with what is actually
retained in heap.
+ * The incoming record's payload is typically still compact/deflated; the
buffered record
+ * holds the fully-materialized Avro {@code IndexedRecord} with prepended
meta-fields, which
+ * is what {@code maxBlockSize} is meant to bound.
*/
- protected void flushToDiskIfRequired(HoodieRecord record, boolean
appendDeleteBlocks) {
- if (numberOfRecords >= (int) (maxBlockSize / averageRecordSize)
- || numberOfRecords % NUMBER_OF_RECORDS_TO_ESTIMATE_RECORD_SIZE == 0) {
- averageRecordSize = (long) (averageRecordSize * 0.8 +
sizeEstimator.sizeEstimate(record) * 0.2);
+ protected void flushToDiskIfRequired(HoodieRecord bufferedRecord, boolean
appendDeleteBlocks) {
+ if (bufferedRecord != null
+ && (averageRecordSize == 0
Review Comment:
🤖 nit: the `Math.max(averageRecordSize, 1)` here is dead-defensive — the
`averageRecordSize == 0` clause on the line above short-circuits the OR before
this division is ever evaluated. Could simplify to plain `maxBlockSize /
averageRecordSize` to match the second gate below and avoid making readers
wonder which case the `max` is guarding.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]