[ 
https://issues.apache.org/jira/browse/KAFKA-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052149#comment-15052149
 ] 

ASF GitHub Bot commented on KAFKA-2978:
---------------------------------------

GitHub user hachikuji opened a pull request:

    https://github.com/apache/kafka/pull/666

    KAFKA-2978: consumer stops fetching when consumed and fetch positions get 
out of sync

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hachikuji/kafka KAFKA-2978

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #666
    
----
commit 8a441def79cc8fa21da97759068c0caf7b7b425a
Author: Jason Gustafson <ja...@confluent.io>
Date:   2015-12-11T04:30:43Z

    KAFKA-2978: consumer stops fetching when consumed and fetch positions get 
out of sync

----


> Topic partition is not sometimes consumed after rebalancing of consumer group
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-2978
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2978
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer, core
>    Affects Versions: 0.9.0.0
>            Reporter: Michal Turek
>            Assignee: Jason Gustafson
>            Priority: Critical
>             Fix For: 0.9.0.1
>
>
> Hi there, we are evaluating Kafka 0.9 to find if it is stable enough and 
> ready for production. We wrote a tool that basically verifies that each 
> produced message is also properly consumed. We found the issue described 
> below while stressing Kafka using this tool.
> Adding more and more consumers to a consumer group may result in unsuccessful 
> rebalancing. Data from one or more partitions are not consumed and are not 
> effectively available to the client application (e.g. for 15 minutes). 
> Situation can be resolved externally by touching the consumer group again 
> (add or remove a consumer) which forces another rebalancing that may or may 
> not be successful.
> Significantly higher CPU utilization was observed in such cases (from about 
> 3% to 17%). The CPU utilization takes place in both the affected consumer and 
> Kafka broker according to htop and profiling using jvisualvm. 
> Jvisualvm indicates the issue may be related to KAFKA-2936 (see its 
> screenshots in the GitHub repo below), but I'm very unsure. I don't also know 
> if the issue is in consumer or broker because both are affected and I don't 
> know Kafka internals.
> The issue is not deterministic but it can be easily reproduced after a few 
> minutes just by executing more and more consumers. More parallelism with 
> multiple CPUs probably gives the issue more chances to appear.
> The tool itself together with very detailed instructions for quite reliable 
> reproduction of the issue and initial analysis are available here:
> - https://github.com/avast/kafka-tests
> - https://github.com/avast/kafka-tests/tree/issue1/issues/1_rebalancing
> - Prefer fixed tag {{issue1}} to branch {{master}} which may change.
> - Note there are also various screenshots of jvisualvm together with full 
> logs from all components of the tool.
> My colleague was able to independently reproduce the issue according to the 
> instructions above. If you have any questions or if you need any help with 
> the tool, just let us know. This issue is blocker for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to