[ 
https://issues.apache.org/jira/browse/KAFKA-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810045#comment-16810045
 ] 

Guozhang Wang commented on KAFKA-4600:
--------------------------------------

Sure.

The issue of KAFKA-4600 is the following sequence (I did not include the line 
number since in trunk it has been changed compared with the original PR 
https://github.com/apache/kafka/pull/3181, but you can search for the function 
name to get it):

1. First round-trip completed, "rejoinNeeded = false" called on 
{{JoinGroupResponseHandler}}.
2. Second round-trip complete, {{SyncGroupResponseHandler}} calls 
{{onJoinComplete}}, which first set the new assignment, and then calls the 
{{RebalanceListener#onPartitionsAssigned}}, which throws an exception and then 
got swallowed as an error message.
3. Consumer will continue fetching from the newly assigned partitions, no 
re-join group will be issued since in step 1) rejoinNeeded is already set to 
false.

The fix of KAFKA-5154 is that we delay "rejoinNeeded = false" to the end of 
{{SyncGroupResponseHandler}}, not {{JoinGroupResponseHandler}}. So in the above 
sequence, at step 3:

3. ConsumerCoordinator see rejoinNeeded is true, and hence revoke its 
assignment immediately and then send JoinGroupRequest again, no data from those 
partitions will be fetched.

Note as I mentioned above, if the error thrown from the 
{{onPartitionsAssigned}} is just transient, then a later rebalance will succeed 
(hopefully), if it is consistent due to bugs, then consumer will falls into 
this endless loop of rejoining group and failing the callback, but no data will 
be fetched.

> Consumer proceeds on when ConsumerRebalanceListener fails
> ---------------------------------------------------------
>
>                 Key: KAFKA-4600
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4600
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.1.1
>            Reporter: Braedon Vickers
>            Priority: Major
>
> One of the use cases for a ConsumerRebalanceListener is to load state 
> necessary for processing a partition when it is assigned. However, when 
> ConsumerRebalanceListener.onPartitionsAssigned() fails for some reason (i.e. 
> the state isn't loaded), the error is logged and the consumer proceeds on as 
> if nothing happened, happily consuming messages from the new partition. When 
> the state is relied upon for correct processing, this can be very bad, e.g. 
> data loss can occur.
> It would be better if the error was propagated up so it could be dealt with 
> normally. At the very least the assignment should fail so the consumer 
> doesn't see any messages from the new partitions, and the rebalance can be 
> reattempted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to