[ https://issues.apache.org/jira/browse/KAFKA-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051313#comment-15051313 ]
Michal Turek commented on KAFKA-2978: ------------------------------------- Ismael, I afraid I have reproduced the issue using code from the 0.9.0 branch, the behavior is exactly the same as with official 0.9.0.0 release. The following is just for repeatability... {noformat} git pull https://github.com/apache/kafka.git cd kafka git checkout 0.9.0 git log | head -n3 commit a5fa661227b0b0b7da86b10b48e94bfb87d0b71d Author: Edward Ribeiro <edward.ribe...@gmail.com> Date: Wed Dec 9 20:34:09 2015 -0800 gradle ./gradlew clean ./gradlew releaseTarGz -x signArchives # ./core/build/distributions/kafka_2.10-0.9.0.0.tgz # Install the JARs locally to ~/.m2/repository/... mvn install:install-file -Dfile=clients/build/libs/kafka-clients-0.9.0.0.jar -DgroupId=org.apache.kafka -DartifactId=kafka-clients -Dversion=0.9.0.0-localbuild -Dpackaging=jar mvn install:install-file -Dfile=clients/build/libs/kafka-clients-0.9.0.0-sources.jar -DgroupId=org.apache.kafka -DartifactId=kafka-clients -Dversion=0.9.0.0-localbuild -Dclassifier=sources -Dpackaging=jar mvn install:install-file -Dfile=clients/build/libs/kafka-clients-0.9.0.0-javadoc.jar -DgroupId=org.apache.kafka -DartifactId=kafka-clients -Dversion=0.9.0.0-localbuild -Dclassifier=javadoc -Dpackaging=jar <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.9.0.0-localbuild</version> </dependency> {noformat} > Topic partition is not sometimes consumed after rebalancing of consumer group > ----------------------------------------------------------------------------- > > Key: KAFKA-2978 > URL: https://issues.apache.org/jira/browse/KAFKA-2978 > Project: Kafka > Issue Type: Bug > Components: consumer, core > Affects Versions: 0.9.0.0 > Reporter: Michal Turek > Assignee: Neha Narkhede > Priority: Critical > Fix For: 0.9.0.1 > > > Hi there, we are evaluating Kafka 0.9 to find if it is stable enough and > ready for production. We wrote a tool that basically verifies that each > produced message is also properly consumed. We found the issue described > below while stressing Kafka using this tool. > Adding more and more consumers to a consumer group may result in unsuccessful > rebalancing. Data from one or more partitions are not consumed and are not > effectively available to the client application (e.g. for 15 minutes). > Situation can be resolved externally by touching the consumer group again > (add or remove a consumer) which forces another rebalancing that may or may > not be successful. > Significantly higher CPU utilization was observed in such cases (from about > 3% to 17%). The CPU utilization takes place in both the affected consumer and > Kafka broker according to htop and profiling using jvisualvm. > Jvisualvm indicates the issue may be related to KAFKA-2936 (see its > screenshots in the GitHub repo below), but I'm very unsure. I don't also know > if the issue is in consumer or broker because both are affected and I don't > know Kafka internals. > The issue is not deterministic but it can be easily reproduced after a few > minutes just by executing more and more consumers. More parallelism with > multiple CPUs probably gives the issue more chances to appear. > The tool itself together with very detailed instructions for quite reliable > reproduction of the issue and initial analysis are available here: > - https://github.com/avast/kafka-tests > - https://github.com/avast/kafka-tests/tree/issue1/issues/1_rebalancing > - Prefer fixed tag {{issue1}} to branch {{master}} which may change. > - Note there are also various screenshots of jvisualvm together with full > logs from all components of the tool. > My colleague was able to independently reproduce the issue according to the > instructions above. If you have any questions or if you need any help with > the tool, just let us know. This issue is blocker for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)