wombatu-kun opened a new pull request, #19018:
URL: https://github.com/apache/hudi/pull/19018

   ### Describe the issue this Pull Request addresses
   
   `HoodieSinkTask.put` allocates a new `TopicPartition(topic, partition)` for 
every incoming record solely to look the record's participant up in the 
`transactionParticipants` map, then discards it. On a high-throughput sink this 
is one short-lived allocation per record. A JMH micro-benchmark confirms the 
allocation is real and is not eliminated by escape analysis.
   
   ### Summary and Changelog
   
   Maintain a secondary `topic -> partition -> participant` index alongside the 
existing `transactionParticipants` map, populated and cleared at the same 
lifecycle points (`bootstrap`, `close`, `cleanup`). Route records through this 
index in `put()` using a topic string lookup plus a partition `int` lookup 
(small ints are cached by the JVM), which removes the per-record 
`TopicPartition` allocation. The primary `TopicPartition`-keyed map is 
unchanged and still used by the assignment loop, `preCommit`, and partition 
close.
   
   ### Impact
   
   Performance only; no public API or behavior change. JMH micro-benchmark of 
routing one record to its participant (AverageTime mode, gc profiler):
   
   | Metric (per record) | Baseline (new TopicPartition) | After (nested map) |
   |---------------------|------------------------------:|-------------------:|
   | Time | 11.76 ns/op | 10.80 ns/op (-8%) |
   | Allocations | 24 B/op | ~0 B/op |
   
   This is a small per-record win; at high record rates it removes roughly 24 B 
of garbage per record on the `put()` path. Benchmark code is not included in 
this PR.
   
   ### Risk Level
   
   low
   
   Behavior-preserving routing refactor: the secondary index mirrors 
`transactionParticipants` and is maintained at the same points, and lookups are 
equivalent to the previous `TopicPartition`-keyed lookup, read on the single 
task thread. The full `hudi-kafka-connect` unit suite passes. 
`HoodieSinkTask.put` is not directly unit-tested, so the change is 
intentionally a minimal mirror of the existing lookup.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to