[ https://issues.apache.org/jira/browse/KAFKA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17999332#comment-17999332 ]
Kirk True commented on KAFKA-19259: ----------------------------------- [~goyarpit] the fetch eviction rate issue was found using an internal testing environment, so it's not something that can be shared. For 4.2, I'm taking on the ambitious task of solving the consumer performance issues we've uncovered (including this one). A good chunk of that work is to build up some of our internal testing infrastructure so that we can run proposed fixes against a battery of performance workloads to ensure there aren't any regressions. From a logistical perspective, the tests require a lot of infrastructure that can't be replicated. So while I can't make those tests self-service, if you want to provide suggested fixes, we should be able to run those in our testing infrastructure and share the logs, metrics, etc. > Async consumer fetch intermittent delays on console consumer > ------------------------------------------------------------ > > Key: KAFKA-19259 > URL: https://issues.apache.org/jira/browse/KAFKA-19259 > Project: Kafka > Issue Type: Bug > Components: clients, consumer > Affects Versions: 4.0.0 > Reporter: Lianet Magrans > Assignee: Arpit Goyal > Priority: Major > Labels: consumer-threading-refactor > Fix For: 4.2.0 > > Attachments: Screenshot 2025-05-31 at 10.44.29 PM.png, > console-consumer-classic-vs-consumer.mov, consumer11_KAFKA-19259.log, > consumer_KAFKA-19259.log, debug5.log > > > We noticed that fetching with the kafka-console-consumer.sh tool using the > new consumer shows some intermittent delays, that are not seen when running > the same with the classic consumer. Note that I disabled auto-commit to > isolate the delay, and from a first look seems to come from the > fetchBuffer.awaitNonEmpty logic, that alternatively takes almost the full > poll timeout (runs "fast", then "slow", and continues to alternate) > [https://github.com/apache/kafka/blob/0b81d6c7802c1be55dc823ce51729f2c6a6071a7/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1808] > > The difference in behaviour between the 2 consumers can be seen with this > setup: > * topic with 6 partitions (I tried with 1 partition first and didn't see the > delay, then with 3 and 6 I could see it) > * data populated in topic with producer sending generated uuids to the topic > in while loop > * run console consumer (asycn) no commit: > bin/kafka-console-consumer.sh --topic t1 --bootstrap-server localhost:9092 > --consumer-property group.protocol=consumer --group cg1 --consumer-property > enable.auto.commit=false > Here we can notice the pattern that looks like batches, and custom logs on > the awaitNonEmpty show it take the full poll timeout on alternate poll > iterations. > * run same but for classic consumer (consumer-property > group.protocol=classic) -> not such pattern of intermittent delays > Produce continuously (I used this) > while sleep 1; do echo $(uuidgen); done | bin/kafka-console-producer.sh > --bootstrap-server localhost:9092 --topic t1 > This needs more investigation to fully understand if it's indeed something in > the fetch path or something else) -- This message was sent by Atlassian Jira (v8.20.10#820010)