Hi all, I’ve been benchmarking share groups with simulated processing time in the consumers, and wanted to raise a question about record allocation across pending fetches.
Blog post with details/results: https://jack-vanlightly.com/blog/2026/5/25/kafka-share-groups-and-parallelizing-consumption-part-1-tuning-maxpollrecords The immediate lesson from the benchmark is that max.poll.records needs to be set appropriately for share groups. In particular, it should be sized relative to group.share.partition.max.record.locks and the number of consumers per partition. With consumer groups, max.poll.records was not nearly as important, and I think the default of 500 is too high for share groups. Standard benchmarks don't highlight the issue, only when processing time is included in the benchmark. With the defaults, I saw 300 consumers, 6 partitions, and 5 ms processing time settle at around 4.8K msg/s instead of the theoretical 60K msg/s. The reason was that a few consumers could acquire large batches, occupy most of the per-partition inflight window, and leave the rest of the consumers mostly idle. Reducing max.poll.records fixed the benchmark and allowed the same setup to sustain 60K msg/s. So users must tune max.poll.records. But I think this is suboptimal from a usability perspective. The right value depends on the inflight record limit, consumers per partition, processing time, timing variance, partition skew, etc. The correct value can change over time. Processing time may change as application behavior changes. The number of consumers may change if the application autoscales. My post’s rule of thumb is to divide group.share.partition.max.record.locks by consumers per partition, then set max.poll.records somewhat lower to leave room for variance. So I wonder whether share groups should consider a less greedy allocation strategy when multiple consumers have pending fetches. Instead of filling one fetch up to max.poll.records before serving others, the broker could try to distribute available records more evenly across pending fetches (improving the effective parallelism). I’m not proposing a specific algorithm here, and there are obvious trade-offs around broker complexity, fetch efficiency, and configurability. But it seems worth discussing because fairer allocation could make share groups less sensitive to precise max.poll.records tuning and better preserve the parallelism that share groups are intended to provide. Curious what others think. Jack Vanlightly
