[
https://issues.apache.org/jira/browse/KAFKA-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053206#comment-15053206
]
Jason Gustafson commented on KAFKA-2978:
----------------------------------------
[[email protected]] Thanks for confirming that the fix worked. I'm not sure
about the release timeline for 0.9.0.1, but my guess is that they'll probably
give it a few more weeks to shake out any other critical bugs. What do you
think, [~guozhang]?
> Topic partition is not sometimes consumed after rebalancing of consumer group
> -----------------------------------------------------------------------------
>
> Key: KAFKA-2978
> URL: https://issues.apache.org/jira/browse/KAFKA-2978
> Project: Kafka
> Issue Type: Bug
> Components: consumer, core
> Affects Versions: 0.9.0.0
> Reporter: Michal Turek
> Assignee: Jason Gustafson
> Priority: Critical
> Fix For: 0.9.0.1
>
>
> Hi there, we are evaluating Kafka 0.9 to find if it is stable enough and
> ready for production. We wrote a tool that basically verifies that each
> produced message is also properly consumed. We found the issue described
> below while stressing Kafka using this tool.
> Adding more and more consumers to a consumer group may result in unsuccessful
> rebalancing. Data from one or more partitions are not consumed and are not
> effectively available to the client application (e.g. for 15 minutes).
> Situation can be resolved externally by touching the consumer group again
> (add or remove a consumer) which forces another rebalancing that may or may
> not be successful.
> Significantly higher CPU utilization was observed in such cases (from about
> 3% to 17%). The CPU utilization takes place in both the affected consumer and
> Kafka broker according to htop and profiling using jvisualvm.
> Jvisualvm indicates the issue may be related to KAFKA-2936 (see its
> screenshots in the GitHub repo below), but I'm very unsure. I don't also know
> if the issue is in consumer or broker because both are affected and I don't
> know Kafka internals.
> The issue is not deterministic but it can be easily reproduced after a few
> minutes just by executing more and more consumers. More parallelism with
> multiple CPUs probably gives the issue more chances to appear.
> The tool itself together with very detailed instructions for quite reliable
> reproduction of the issue and initial analysis are available here:
> - https://github.com/avast/kafka-tests
> - https://github.com/avast/kafka-tests/tree/issue1/issues/1_rebalancing
> - Prefer fixed tag {{issue1}} to branch {{master}} which may change.
> - Note there are also various screenshots of jvisualvm together with full
> logs from all components of the tool.
> My colleague was able to independently reproduce the issue according to the
> instructions above. If you have any questions or if you need any help with
> the tool, just let us know. This issue is blocker for us.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)