[ 
https://issues.apache.org/jira/browse/KAFKA-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806234#comment-16806234
 ] 

Guozhang Wang commented on KAFKA-4600:
--------------------------------------

Hello [~dana.powers] The root cause is around when we should set `need-rejoin` 
boolean flag to false. Prior to KAFKA-5154 it was reset after the join-group 
response is received, so if there's an error after that, e.g. during sync-group 
round trip, e.g. in this ticket inside the onAssign callback, then the consumer 
will just continue fetching from the previously assigned partitions, like this 
ticket reportedly observed. In KAFKA-5154 we pushed `resetJoinGroupFuture()` 
after the `onJoinComplete` code, which will cover this case if the error was 
thrown inside the callback the consumer will not proceed to fetch from 
previously assigned partitions.

As for error propagation, right now we already log ERROR as `User provided 
listener {} failed on partition assignment`, and because of the fix of 
KAFKA-5154 it will block consumer from proceeding.

> Consumer proceeds on when ConsumerRebalanceListener fails
> ---------------------------------------------------------
>
>                 Key: KAFKA-4600
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4600
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.1.1
>            Reporter: Braedon Vickers
>            Priority: Major
>             Fix For: 0.11.0.0
>
>
> One of the use cases for a ConsumerRebalanceListener is to load state 
> necessary for processing a partition when it is assigned. However, when 
> ConsumerRebalanceListener.onPartitionsAssigned() fails for some reason (i.e. 
> the state isn't loaded), the error is logged and the consumer proceeds on as 
> if nothing happened, happily consuming messages from the new partition. When 
> the state is relied upon for correct processing, this can be very bad, e.g. 
> data loss can occur.
> It would be better if the error was propagated up so it could be dealt with 
> normally. At the very least the assignment should fail so the consumer 
> doesn't see any messages from the new partitions, and the rebalance can be 
> reattempted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to