[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress

Geoffrey Stewart (JIRA) Fri, 14 Jul 2017 16:34:44 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088285#comment-16088285
 ]


Geoffrey Stewart commented on KAFKA-5016:
-----------------------------------------

I have also encountered the issue documented in this Jira using 0.10.2.0 
brokers with the 0.10.2.0 client.  This issue only occurs when we use the 
"subscribe" call from the API, which dynamically assigns partitions.  When we 
use the "assign" call from the API, to manually assign lists of partitions, we 
do not have any issue.  I don't think what is being described above represents 
the expected behavior of dynamic partition assignment and consumer group 
coordination.  Based on the above explanation it sounds like it would not be 
possible to have 2 or more simultaneous consumer instances in the same consumer 
group when using dynamic partition assignment (subscribe).  For example, there 
could be one consumer instance in the group which has made some calls to 
"poll".  As soon as a second consumer instance comes along, it's call to "poll" 
is only processed after max.poll.interval.ms has elapsed since the first 
consumer's most recent poll request - at this time the broker will no longer 
consider that this first consumer is part of the group.  I certainly agree that 
with the arrival of the second consumer to the group, the broker must perform a 
rebalance or restabilization which may take some time.  However this should not 
take max.poll.interval.ms since the liveness of the first consumer should be 
maintained by it's heartbeat which occurs every heartbeat.interval.ms.  I have 
confirmed that by using the default value for the property max.poll.interval.ms 
of 300000, the group restabilization (rebalance) takes about this long (5mins) 
and then the second consumer instance's poll request is processed.  Lowering 
this value to 30000, has the effect of reducing the group restabilization 
(rebalance) to about 30 seconds before the second consumer instance's poll 
request is processed.
To summarize, please explain how I can establish parallel consumer instances in 
the same group using the subscribe method from the API, which dynamically 
assigns partitions.  Further, please help me to understand why the consumer 
instances heartbeat does not seem to be maintaining it's liveness.

> Consumer hang in poll method while rebalancing is in progress
> -------------------------------------------------------------
>
>                 Key: KAFKA-5016
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5016
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.1.0, 0.10.2.0
>            Reporter: Domenico Di Giulio
>            Assignee: Vahid Hashemian
>         Attachments: Kafka 0.10.2.0 Issue (TRACE) - Server + Client.txt, 
> Kafka 0.10.2.0 Issue (TRACE).txt, KAFKA_5016.java
>
>
> After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the 
> rebalancing code. 
> This is a test case, not (still) production code. It does the following with 
> a single-partition topic and two consumers in the same group:
> 1) a topic with one partition is forced to be created (auto-created)
> 2) a producer is used to write 10 messages
> 3) the first consumer reads all the messages and commits
> 4) the second consumer attempts a poll() and hangs indefinitely
> The same issue can't be found with 0.10.0.0.
> See the attached logs at TRACE level. Look for "SERVER HANGS" to see where 
> the hang is found: when this happens, the client keeps failing any hearbeat 
> attempt, as the rebalancing is in progress, and the poll method hangs 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5016) Consumer hang in poll method while rebalancing is in progress

Reply via email to