Re: [PR] fix(append): size buffered record, not incoming, in block-flush gate [hudi]

via GitHub Tue, 26 May 2026 00:48:47 -0700


danny0405 commented on code in PR #18843:
URL: https://github.com/apache/hudi/pull/18843#discussion_r3302091018



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -247,7 +248,10 @@ private void init(HoodieRecord record) {
     deltaWriteStat.setPartitionPath(partitionPath);
     deltaWriteStat.setFileId(fileId);
     Option<FileSlice> fileSliceOpt = 
populateWriteStatAndFetchFileSlice(record, deltaWriteStat);
-    averageRecordSize = sizeEstimator.sizeEstimate(record);
+    // averageRecordSize is seeded lazily in flushToDiskIfRequired on the 
first buffered
+    // (post-prepareRecord) record — sizing the incoming record here 
under-counts heap on
+    // Spark engines, where the incoming record is a compact UnsafeRow but the 
buffered

Review Comment:
   didn't quite get where the UnsafeRow got transformed into the avro record? 
The Spark still uses Avro as record type in writer path right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(append): size buffered record, not incoming, in block-flush gate [hudi]

Reply via email to