danny0405 commented on code in PR #18843:
URL: https://github.com/apache/hudi/pull/18843#discussion_r3308110903
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -247,7 +248,11 @@ private void init(HoodieRecord record) {
deltaWriteStat.setPartitionPath(partitionPath);
deltaWriteStat.setFileId(fileId);
Option<FileSlice> fileSliceOpt =
populateWriteStatAndFetchFileSlice(record, deltaWriteStat);
- averageRecordSize = sizeEstimator.sizeEstimate(record);
+ // averageRecordSize is seeded lazily in flushToDiskIfRequired on the
first buffered
+ // (post-prepareRecord) record. Sizing the incoming record here
under-counts heap
+ // because recordList retains the post-prepareRecord clone: a
fully-materialized Avro
+ // IndexedRecord with prepended meta-fields, whereas the incoming record's
payload
Review Comment:
we can try to serialize the IndexedRecord into avro bytes within the hoodie
record after the meta-fields prepend so that we keep the records in-memory
compact and reduce the gap between the size of the in-memory records and the
actual serialized log block.
We did the similar thing in spillable map to reduce the spills and here it
is also suitable for this buffering, so that we reduce the number of small log
blocks to gain perfs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]