[ https://issues.apache.org/jira/browse/KAFKA-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-2978: --------------------------------- Resolution: Fixed Status: Resolved (was: Patch Available) Issue resolved by pull request 666 [https://github.com/apache/kafka/pull/666] > Topic partition is not sometimes consumed after rebalancing of consumer group > ----------------------------------------------------------------------------- > > Key: KAFKA-2978 > URL: https://issues.apache.org/jira/browse/KAFKA-2978 > Project: Kafka > Issue Type: Bug > Components: consumer, core > Affects Versions: 0.9.0.0 > Reporter: Michal Turek > Assignee: Jason Gustafson > Priority: Critical > Fix For: 0.9.0.1 > > > Hi there, we are evaluating Kafka 0.9 to find if it is stable enough and > ready for production. We wrote a tool that basically verifies that each > produced message is also properly consumed. We found the issue described > below while stressing Kafka using this tool. > Adding more and more consumers to a consumer group may result in unsuccessful > rebalancing. Data from one or more partitions are not consumed and are not > effectively available to the client application (e.g. for 15 minutes). > Situation can be resolved externally by touching the consumer group again > (add or remove a consumer) which forces another rebalancing that may or may > not be successful. > Significantly higher CPU utilization was observed in such cases (from about > 3% to 17%). The CPU utilization takes place in both the affected consumer and > Kafka broker according to htop and profiling using jvisualvm. > Jvisualvm indicates the issue may be related to KAFKA-2936 (see its > screenshots in the GitHub repo below), but I'm very unsure. I don't also know > if the issue is in consumer or broker because both are affected and I don't > know Kafka internals. > The issue is not deterministic but it can be easily reproduced after a few > minutes just by executing more and more consumers. More parallelism with > multiple CPUs probably gives the issue more chances to appear. > The tool itself together with very detailed instructions for quite reliable > reproduction of the issue and initial analysis are available here: > - https://github.com/avast/kafka-tests > - https://github.com/avast/kafka-tests/tree/issue1/issues/1_rebalancing > - Prefer fixed tag {{issue1}} to branch {{master}} which may change. > - Note there are also various screenshots of jvisualvm together with full > logs from all components of the tool. > My colleague was able to independently reproduce the issue according to the > instructions above. If you have any questions or if you need any help with > the tool, just let us know. This issue is blocker for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)