[ 
https://issues.apache.org/jira/browse/KAFKA-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17999332#comment-17999332
 ] 

Kirk True commented on KAFKA-19259:
-----------------------------------

[~goyarpit] the fetch eviction rate issue was found using an internal testing 
environment, so it's not something that can be shared.

For 4.2, I'm taking on the ambitious task of solving the consumer performance 
issues we've uncovered (including this one). A good chunk of that work is to 
build up some of our internal testing infrastructure so that we can run 
proposed fixes against a battery of performance workloads to ensure there 
aren't any regressions. From a logistical perspective, the tests require a lot 
of infrastructure that can't be replicated. So while I can't make those tests 
self-service, if you want to provide suggested fixes, we should be able to run 
those in our testing infrastructure and share the logs, metrics, etc.

> Async consumer fetch intermittent delays on console consumer
> ------------------------------------------------------------
>
>                 Key: KAFKA-19259
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19259
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 4.0.0
>            Reporter: Lianet Magrans
>            Assignee: Arpit Goyal
>            Priority: Major
>              Labels: consumer-threading-refactor
>             Fix For: 4.2.0
>
>         Attachments: Screenshot 2025-05-31 at 10.44.29 PM.png, 
> console-consumer-classic-vs-consumer.mov, consumer11_KAFKA-19259.log, 
> consumer_KAFKA-19259.log, debug5.log
>
>
> We noticed that fetching with the kafka-console-consumer.sh tool using the 
> new consumer shows some intermittent delays, that are not seen when running 
> the same with the classic consumer. Note that I disabled auto-commit to 
> isolate the delay, and from a first look seems to come from the 
> fetchBuffer.awaitNonEmpty logic, that alternatively takes almost the full 
> poll timeout (runs "fast", then "slow", and continues to alternate)
> [https://github.com/apache/kafka/blob/0b81d6c7802c1be55dc823ce51729f2c6a6071a7/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1808]
>   
> The difference in behaviour between the 2 consumers can be seen with this 
> setup:
>  * topic with 6 partitions (I tried with 1 partition first and didn't see the 
> delay, then with 3 and 6 I could see it) 
>  * data populated in topic with producer sending generated uuids to the topic 
> in while loop 
>  * run console consumer (asycn) no commit:
> bin/kafka-console-consumer.sh --topic t1 --bootstrap-server localhost:9092 
> --consumer-property group.protocol=consumer --group cg1 --consumer-property 
> enable.auto.commit=false
> Here we can notice the pattern that looks like batches, and custom logs on 
> the awaitNonEmpty show it take the full poll timeout on alternate poll 
> iterations.
>  * run same but for classic consumer (consumer-property 
> group.protocol=classic) -> not such pattern of intermittent delays
> Produce continuously (I used this) 
> while sleep 1; do echo $(uuidgen); done | bin/kafka-console-producer.sh 
> --bootstrap-server localhost:9092 --topic t1
> This needs more investigation to fully understand if it's indeed something in 
> the fetch path or something else) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to