wombatu-kun opened a new pull request, #19015:
URL: https://github.com/apache/hudi/pull/19015

   ### Describe the issue this Pull Request addresses
   
   `AbstractConnectWriter.writeRecord` allocates a new `AvroConvertor` on every 
record via `new AvroConvertor(schemaProvider.getSourceHoodieSchema())`. The 
`AvroConvertor(HoodieSchema)` constructor serializes the schema to a string, 
and on the `StringConverter` path the first use of each convertor lazily parses 
the schema (`HoodieSchema.parse`) and builds a new `MercifulJsonConverter`. 
Since a fresh convertor is created per record, this work repeats on every 
record of the hot write path.
   
   ### Summary and Changelog
   
   Build the `AvroConvertor` once per writer and reuse it across all records, 
and read the configured value converter into a `final` field instead of looking 
it up per record. A writer is created per commit and `writeRecord` runs on a 
single thread, so the convertor is created lazily on first use in the 
`StringConverter` branch and reused for the remainder of the commit; the Avro 
path no longer allocates a convertor at all (previously it was created but 
never used). The produced `HoodieRecord`s and record keys are unchanged.
   
   ### Impact
   
   Performance only; no public API or behavior change. Local JMH 
micro-benchmark of `writeRecord` (AverageTime mode, gc profiler, with a caching 
schema provider so the numbers are conservative):
   
   - StringConverter (JSON): 18419 -> 2495 ns/op (-86%, 7.4x faster); 
allocations 33040 -> 3951 B/op (-88%)
   - AvroConverter: 2574 -> 792 ns/op (-69%, 3.25x faster); allocations 10602 
-> 1306 B/op (-88%)
   
   Per-record CPU time and garbage on the connect write path drop by roughly 
7-8x. Benchmark code is not included in this PR.
   
   ### Risk Level
   
   low
   
   Behavior-preserving reuse of an object that is already meant to be reused 
(`AvroConvertor` is constructed once per partition elsewhere in Hudi). Covered 
by the existing `hudi-kafka-connect` unit tests; `TestAbstractConnectWriter` 
exercises both the Avro and JSON paths and passes.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to