[ 
https://issues.apache.org/jira/browse/KAFKA-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2978:
---------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Issue resolved by pull request 666
[https://github.com/apache/kafka/pull/666]

> Topic partition is not sometimes consumed after rebalancing of consumer group
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-2978
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2978
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, core
>    Affects Versions: 0.9.0.0
>            Reporter: Michal Turek
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 0.9.0.1
>
>
> Hi there, we are evaluating Kafka 0.9 to find if it is stable enough and 
> ready for production. We wrote a tool that basically verifies that each 
> produced message is also properly consumed. We found the issue described 
> below while stressing Kafka using this tool.
> Adding more and more consumers to a consumer group may result in unsuccessful 
> rebalancing. Data from one or more partitions are not consumed and are not 
> effectively available to the client application (e.g. for 15 minutes). 
> Situation can be resolved externally by touching the consumer group again 
> (add or remove a consumer) which forces another rebalancing that may or may 
> not be successful.
> Significantly higher CPU utilization was observed in such cases (from about 
> 3% to 17%). The CPU utilization takes place in both the affected consumer and 
> Kafka broker according to htop and profiling using jvisualvm. 
> Jvisualvm indicates the issue may be related to KAFKA-2936 (see its 
> screenshots in the GitHub repo below), but I'm very unsure. I don't also know 
> if the issue is in consumer or broker because both are affected and I don't 
> know Kafka internals.
> The issue is not deterministic but it can be easily reproduced after a few 
> minutes just by executing more and more consumers. More parallelism with 
> multiple CPUs probably gives the issue more chances to appear.
> The tool itself together with very detailed instructions for quite reliable 
> reproduction of the issue and initial analysis are available here:
> - https://github.com/avast/kafka-tests
> - https://github.com/avast/kafka-tests/tree/issue1/issues/1_rebalancing
> - Prefer fixed tag {{issue1}} to branch {{master}} which may change.
> - Note there are also various screenshots of jvisualvm together with full 
> logs from all components of the tool.
> My colleague was able to independently reproduce the issue according to the 
> instructions above. If you have any questions or if you need any help with 
> the tool, just let us know. This issue is blocker for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to