nsivabalan opened a new pull request, #18843: URL: https://github.com/apache/hudi/pull/18843
### Describe the issue this Pull Request addresses `HoodieAppendHandle.bufferRecord` / `bufferInsertAndUpdate` adds the post-`prepareRecord` object — a clone of `populatedRecord` (a `HoodieAvroIndexedRecord` wrapping a fully-deserialized `IndexedRecord`, many KB per record on Spark engines) — to `recordList`. But the block-flush gate in `flushToDiskIfRequired` sized the **incoming** record, which on Spark engines is a compact `UnsafeRow` many times smaller. The estimate under-counted retained heap, so the gate `numberOfRecords >= maxBlockSize / averageRecordSize` fired far too late, `recordList` grew well past one block's worth of heap, and the subsequent `HFileDataBlock` serialization OOMed on metadata-table writes. For Avro-engine writes the incoming and post-prepare shapes are similar, so this change is effectively a no-op on those paths. ### Summary and Changelog **Behavior change:** `averageRecordSize` now reflects the buffered (post-`prepareRecord`) record — the object actually retained on heap — rather than the incoming record handed to the handle. **Code changes (`hudi-client-common/.../HoodieAppendHandle.java`):** - `writeToBuffer`, `bufferRecord`, `bufferInsertAndUpdate` now return the buffered record (or `null` for delete/ignored/error paths). - Call sites in `doAppend` / `doWrite` / `write(Map)` flip order: buffer first, then gate-check with the buffered record. - `flushToDiskIfRequired` sizes the buffered record. Seeds the EWMA lazily on the first non-null buffered record (replacing the wrong eager seed in `init()` that ran before any `prepareRecord` conversion). Guards the gate against `averageRecordSize == 0` to avoid div-by-zero on delete-only prefixes. - `sizeEstimator` changed from `final` to settable via `@VisibleForTesting setSizeEstimator`; other `@VisibleForTesting` getters added for the new test. **Tests (`hudi-client-common/.../TestHoodieAppendHandle.java`, new):** 6 unit tests covering: estimator sees buffered (not incoming) record; lazy initial seed; delete-only windows do not perturb the estimate or trigger flush; EWMA blends 0.8/0.2 after the second sample; gate fires when buffered records exceed `maxBlockSize`; harness self-check. ### Impact - **Spark engine MOR writes**: block-flush gate now trips correctly. `recordList` stays bounded to ~one `maxBlockSize` worth of buffered records, downstream HFile / Avro serialization heap stays bounded. Fixes the heap-pressure failure mode we have observed on metadata-RLI writes. - **Avro engine writes**: incoming and post-prepare shapes are similar — effectively a no-op. - `estimatedNumberOfBytesWritten` (consumed by `canWrite` to roll log file groups) now reflects the JOL size of the buffered Avro record — a strict overestimate of on-disk bytes. Log file groups roll slightly earlier on Spark engines as a result; the existing `hoodie.logfile.to.parquet.compression.ratio` knob retunes if needed. - No public API change. `sizeEstimator` is a private field; the `@VisibleForTesting` setter is for unit tests only. ### Risk Level **low** — the change is local to one file, behind-the-scenes (no API/config), and is a no-op for the Avro path. The Spark-path behavior change is a bug fix: the buffer is now bounded as `maxBlockSize` already promises. The only externally-visible knock-on is that `canWrite` may roll log file groups slightly earlier on Spark; the `logFileToParquetCompressionRatio` config exists precisely for this approximation. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
