JimmyWang6 commented on PR #20246:
URL: https://github.com/apache/kafka/pull/20246#issuecomment-3479273960

   Hi @AndrewJSchofield,
   Thanks for your review and comments.
   
   > I do want to reiterate that we should not be splitting batches on the 
broker unless we really have to
   
   I completely agree! Splitting batches will indeed introduce additional 
overhead for both the server and the client.
   
   However, from a high-level perspective, if the producer configures larger 
values for `linger.ms` and `batch.size`, a single batch may contain a large 
number of records (e.g., 10,000), which could optimize throughput. In this 
scenario, if a shared consumer sets a relatively low value for 
`maxFetchRecords` (e.g., 5) in `record_limit` mode, batch splitting will 
inevitably occur on the broker. To mitigate this overhead, we could recommend 
that users reduce both the `batch.size` and `linger.ms` values.
   
   > In the case where we can give back entire batches but are still below the 
limit, then the entire batches should be returned
   
   This is essentially what I am trying to achieve based on the current 
implementation. Batch splitting only occurs in the `acquireNewBatchRecords` 
method, primarily in the following cases:
   
   1. No overlap with request offsets in the cache.
   2. The inflight batch is not a full match and only  a subset of records 
should be acquired.
   3. There are no enough records up to `maxFetchRecords` in `cacheState` so we 
need to acquire mode records from batches.
   
   I will further investigate whether additional optimizations can be applied 
to the cases below.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to