wombatu-kun opened a new pull request, #19017:
URL: https://github.com/apache/hudi/pull/19017

   ### Describe the issue this Pull Request addresses
   
   `BufferedConnectWriter.flushRecords` materializes the buffered records into 
a new `LinkedList` before passing them to `writeClient.upsertPreppedRecords` / 
`bulkInsertPreppedRecords`. A `LinkedList` allocates a separate node object 
(each holding two link references) for every record and then offers poor 
locality when the write client iterates it once.
   
   ### Summary and Changelog
   
   Replace the two `new LinkedList<>(bufferedRecords.values())` call sites with 
`new ArrayList<>(bufferedRecords.values())`. The `ArrayList` is pre-sized from 
the source collection, stores the elements in a single contiguous array, and is 
iterated once downstream. Behavior is unchanged: both are a 
`List<HoodieRecord>` fully materialized from the spillable map and handed to 
the same write-client methods.
   
   ### Impact
   
   Performance only; no public API or behavior change. This runs once per 
commit, with cost proportional to the number of buffered records. JMH 
micro-benchmark of building the list from a 10,000-record map and iterating it 
once (AverageTime mode, gc profiler):
   
   | Metric (10,000 records) | Baseline (LinkedList) | After (ArrayList) |
   |-------------------------|----------------------:|------------------:|
   | Time per flush | 122.3 us | 91.7 us (-25%) |
   | Allocations | 280049 B | 80033 B (-71%) |
   
   The list allocation drops from about 28 B/record (linked nodes) to about 8 
B/record (a single backing array). Benchmark code is not included in this PR.
   
   ### Risk Level
   
   low
   
   Drop-in `List` replacement; the downstream write clients accept any `List` 
and iterate it once. Covered by the existing `hudi-kafka-connect` unit tests; 
`TestBufferedConnectWriter` asserts the records passed to 
`bulkInsertPreppedRecords` and passes.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to