[ 
https://issues.apache.org/jira/browse/KAFKA-5016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149533#comment-16149533
 ] 

Jason Gustafson commented on KAFKA-5016:
----------------------------------------

[~gdstewar] Let me explain the intended behavior:

1. Rebalances only occur when the user calls {{poll()}}. This is how we know 
that the user has finished processing the records returned from the last 
{{poll()}} which means it is safe to reassign partitions.
2. The heartbeat thread only keeps the consumer in the group. If a rebalance 
begins, the consumer will continue sending heartbeats in the background, but 
the rebalance won't complete until the next {{poll()}}.
3. Because of 1 and 2, the rebalance timeout is equal to 
{{max.poll.interval.ms}}. If the consumer does not call {{poll()}} before 
expiration of the {{max.poll.ms}}, then it may be kicked out of the group in 
spite of the heartbeat thread. This is designed to prevent cases in which the 
application has effectively stalled with the heartbeat thread still active. 
Basically the consumer has to demonstrate progress (by calling {{poll()}}) to 
stay in the group.

Given that, can you explain the specific problem you are seeing?



> Consumer hang in poll method while rebalancing is in progress
> -------------------------------------------------------------
>
>                 Key: KAFKA-5016
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5016
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.1.0, 0.10.2.0
>            Reporter: Domenico Di Giulio
>            Assignee: Vahid Hashemian
>         Attachments: Kafka 0.10.2.0 Issue (TRACE) - Server + Client.txt, 
> Kafka 0.10.2.0 Issue (TRACE).txt, KAFKA_5016.java
>
>
> After moving to Kafka 0.10.2.0, it looks like I'm experiencing a hang in the 
> rebalancing code. 
> This is a test case, not (still) production code. It does the following with 
> a single-partition topic and two consumers in the same group:
> 1) a topic with one partition is forced to be created (auto-created)
> 2) a producer is used to write 10 messages
> 3) the first consumer reads all the messages and commits
> 4) the second consumer attempts a poll() and hangs indefinitely
> The same issue can't be found with 0.10.0.0.
> See the attached logs at TRACE level. Look for "SERVER HANGS" to see where 
> the hang is found: when this happens, the client keeps failing any hearbeat 
> attempt, as the rebalancing is in progress, and the poll method hangs 
> indefinitely.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to