[ 
https://issues.apache.org/jira/browse/KAFKA-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844103#comment-16844103
 ] 

Guozhang Wang commented on KAFKA-4600:
--------------------------------------

[~braedon] Thanks for your feedbacks. The reason I choose to maintain the 
partition after an unsuccessful revocation is that, in the rebalance protocol, 
the partition would NOT be re-assigned until it is clear that no one currently 
owns it -- i.e. it is re-assignable. In the above case, for example, if the 
`consumer.poll` is called again, then the consumer will send the join-group 
request again claiming that it still owns partition 1, and hence it would not 
be re-assigned elsewhere; if a new generation has already be formed before 
consumer retries `poll`, then its join-group request, or its commit-offset 
request would all be rejected as a fatal error and the consumer has to clear up 
all its owned partitions as "lost" and rejoin as a new member. In either case, 
users do not worry if the have consumed some messages un-safely due to partial 
revocations, since they either still owns it a no one else would gets messages 
from this partition, or they have already lost it and even though they may get 
some messages before trying to heartbeat / commit offset, their committing are 
doomed to fail, similar like the current situation. Does that make sense to you?

> Consumer proceeds on when ConsumerRebalanceListener fails
> ---------------------------------------------------------
>
>                 Key: KAFKA-4600
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4600
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.1.1
>            Reporter: Braedon Vickers
>            Priority: Major
>
> One of the use cases for a ConsumerRebalanceListener is to load state 
> necessary for processing a partition when it is assigned. However, when 
> ConsumerRebalanceListener.onPartitionsAssigned() fails for some reason (i.e. 
> the state isn't loaded), the error is logged and the consumer proceeds on as 
> if nothing happened, happily consuming messages from the new partition. When 
> the state is relied upon for correct processing, this can be very bad, e.g. 
> data loss can occur.
> It would be better if the error was propagated up so it could be dealt with 
> normally. At the very least the assignment should fail so the consumer 
> doesn't see any messages from the new partitions, and the rebalance can be 
> reattempted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to