wombatu-kun opened a new pull request, #19015: URL: https://github.com/apache/hudi/pull/19015
### Describe the issue this Pull Request addresses `AbstractConnectWriter.writeRecord` allocates a new `AvroConvertor` on every record via `new AvroConvertor(schemaProvider.getSourceHoodieSchema())`. The `AvroConvertor(HoodieSchema)` constructor serializes the schema to a string, and on the `StringConverter` path the first use of each convertor lazily parses the schema (`HoodieSchema.parse`) and builds a new `MercifulJsonConverter`. Since a fresh convertor is created per record, this work repeats on every record of the hot write path. ### Summary and Changelog Build the `AvroConvertor` once per writer and reuse it across all records, and read the configured value converter into a `final` field instead of looking it up per record. A writer is created per commit and `writeRecord` runs on a single thread, so the convertor is created lazily on first use in the `StringConverter` branch and reused for the remainder of the commit; the Avro path no longer allocates a convertor at all (previously it was created but never used). The produced `HoodieRecord`s and record keys are unchanged. ### Impact Performance only; no public API or behavior change. Local JMH micro-benchmark of `writeRecord` (AverageTime mode, gc profiler, with a caching schema provider so the numbers are conservative): - StringConverter (JSON): 18419 -> 2495 ns/op (-86%, 7.4x faster); allocations 33040 -> 3951 B/op (-88%) - AvroConverter: 2574 -> 792 ns/op (-69%, 3.25x faster); allocations 10602 -> 1306 B/op (-88%) Per-record CPU time and garbage on the connect write path drop by roughly 7-8x. Benchmark code is not included in this PR. ### Risk Level low Behavior-preserving reuse of an object that is already meant to be reused (`AvroConvertor` is constructed once per partition elsewhere in Hudi). Covered by the existing `hudi-kafka-connect` unit tests; `TestAbstractConnectWriter` exercises both the Avro and JSON paths and passes. ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
