JimmyWang6 commented on PR #20246: URL: https://github.com/apache/kafka/pull/20246#issuecomment-3479273960
Hi @AndrewJSchofield, Thanks for your review and comments. > I do want to reiterate that we should not be splitting batches on the broker unless we really have to I completely agree! Splitting batches will indeed introduce additional overhead for both the server and the client. However, from a high-level perspective, if the producer configures larger values for `linger.ms` and `batch.size`, a single batch may contain a large number of records (e.g., 10,000), which could optimize throughput. In this scenario, if a shared consumer sets a relatively low value for `maxFetchRecords` (e.g., 5) in `record_limit` mode, batch splitting will inevitably occur on the broker. To mitigate this overhead, we could recommend that users reduce both the `batch.size` and `linger.ms` values. > In the case where we can give back entire batches but are still below the limit, then the entire batches should be returned This is essentially what I am trying to achieve based on the current implementation. Batch splitting only occurs in the `acquireNewBatchRecords` method, primarily in the following cases: 1. No overlap with request offsets in the cache. 2. The inflight batch is not a full match and only a subset of records should be acquired. 3. There are no enough records up to `maxFetchRecords` in `cacheState` so we need to acquire mode records from batches. I will further investigate whether additional optimizations can be applied to the cases below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
