danny0405 commented on code in PR #18843:
URL: https://github.com/apache/hudi/pull/18843#discussion_r3308110903


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -247,7 +248,11 @@ private void init(HoodieRecord record) {
     deltaWriteStat.setPartitionPath(partitionPath);
     deltaWriteStat.setFileId(fileId);
     Option<FileSlice> fileSliceOpt = 
populateWriteStatAndFetchFileSlice(record, deltaWriteStat);
-    averageRecordSize = sizeEstimator.sizeEstimate(record);
+    // averageRecordSize is seeded lazily in flushToDiskIfRequired on the 
first buffered
+    // (post-prepareRecord) record. Sizing the incoming record here 
under-counts heap
+    // because recordList retains the post-prepareRecord clone: a 
fully-materialized Avro
+    // IndexedRecord with prepended meta-fields, whereas the incoming record's 
payload

Review Comment:
   we can try to serialize the IndexedRecord into avro bytes within the hoodie 
record after the meta-fields prepend so that we keep the records in-memory 
compact and reduce the gap between the size of the in-memory records and the 
actual serialized log block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to