[ 
https://issues.apache.org/jira/browse/KAFKA-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360328#comment-17360328
 ] 

A. Sophie Blee-Goldman commented on KAFKA-12920:
------------------------------------------------

While there does seem to be a possible issue with the cooperative-sticky 
assignor, I don't believe we've found it yet. It's not expected that the 
cooperative-sticky assignor would have cleared the `memberAssignment` and 
`generation` since only the plain sticky assignor uses that stored info. The 
cooperative-sticky assignor gets the partitions for the previous assignment not 
from the `memberAssignment` but instead directly from the SubscriptionState. 
And this _should_ be cleared anytime `onPartitionsLost` is invoked.

However we can keep this ticket open for now to track the investigation into 
the cooperative-sticky assignor. For some context, we've seen a report of 
continuous rebalancing with JoinGroup requests that seem to encode the same 
partition in the previous assignment of two consumers. It's not clear how this 
situation arose, but once we have these initial conditions to the 
cooperative-sticky assignor, it will detect an issue and throw an 
IllegalStateException.

At the very least we should definitely improve the assignor to check for this 
condition and handle it by invalidating those previous assignments, rather than 
just throwing an exception repeatedly. But what's still unclear is how we got 
these conditions to begin with – it doesn't seem possible for the assignor to 
have produced an assignment with double partition ownership, as it would have 
thrown this IllegalStateException. That's what still needs to be investigated, 
and what this report was hoping to have uncovered

> Consumer's cooperative sticky assignor need to clear generation / assignment 
> data upon `onPartitionsLost`
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-12920
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12920
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: bug, consumer
>
> Consumer's cooperative-sticky assignor does not track the owned partitions 
> inside the assignor --- i.e. when it reset its state in event of 
> ``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the 
> assignor would not be cleared. This would cause a member to join with empty 
> generation on the protocol while with non-empty user-data encoding the old 
> assignment still (and hence pass the validation check on broker side during 
> JoinGroup), and eventually cause a single partition to be assigned to 
> multiple consumers within a generation.
> We should let the assignor to also clear its assignment/generation when 
> ``onPartitionsLost`` is triggered in order to avoid this scenario.
> Note that 1) for the regular sticky assignor the generation would still have 
> an older value, and this would cause the previously owned partitions to be 
> discarded during the assignment, and 2) for Streams' sticky assignor, it’s 
> encoding would indeed be cleared along with ``onPartitionsLost``. Hence only 
> Consumer's cooperative-sticky assignor have this issue to solve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to