[ https://issues.apache.org/jira/browse/KAFKA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434281#comment-17434281 ]
Andrei D edited comment on KAFKA-12984 at 10/26/21, 11:15 AM: -------------------------------------------------------------- Here are debug logs. The freeze happened at 15:48:23 we used CooperativeStickyAssignor from v 3.0.0 here [^logs-insights-results-kafka.csv] was (Author: andy_dufresne): Here are debug logs. The freeze happened at 15:48:23 [^logs-insights-results-kafka.csv] > Cooperative sticky assignor can get stuck with invalid SubscriptionState > input metadata > --------------------------------------------------------------------------------------- > > Key: KAFKA-12984 > URL: https://issues.apache.org/jira/browse/KAFKA-12984 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: A. Sophie Blee-Goldman > Assignee: A. Sophie Blee-Goldman > Priority: Blocker > Fix For: 2.8.1, 3.0.0 > > Attachments: image-2021-10-25-11-53-40-221.png, > log-events-viewer-result-kafka.numbers, logs-insights-results-kafka.csv, > logs-insights-results-kafka.numbers > > > Some users have reported seeing their consumer group become stuck in the > CompletingRebalance phase when using the cooperative-sticky assignor. Based > on the request metadata we were able to deduce that multiple consumers were > reporting the same partition(s) in their "ownedPartitions" field of the > consumer protocol. Since this is an invalid state, the input causes the > cooperative-sticky assignor to detect that something is wrong and throw an > IllegalStateException. If the consumer application is set up to simply retry, > this will cause the group to appear to hang in the rebalance state. > The "ownedPartitions" field is encoded based on the ConsumerCoordinator's > SubscriptionState, which was assumed to always be up to date. However there > may be cases where the consumer has dropped out of the group but fails to > clear the SubscriptionState, allowing it to report some partitions as owned > that have since been reassigned to another member. > We should (a) fix the sticky assignment algorithm to resolve cases of > improper input conditions by invalidating the "ownedPartitions" in cases of > double ownership, and (b) shore up the ConsumerCoordinator logic to better > handle rejoining the group and keeping its internal state consistent. See > KAFKA-12983 for more details on (b) -- This message was sent by Atlassian Jira (v8.3.4#803005)