[
https://issues.apache.org/jira/browse/KAFKA-10105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismael Juma resolved KAFKA-10105.
---------------------------------
Resolution: Duplicate
Closing as duplicate of KAFKA-9752. Please reopen if reproduced with 2.5.0.
> Regression in group coordinator dealing with flaky clients joining while
> leaving
> --------------------------------------------------------------------------------
>
> Key: KAFKA-10105
> URL: https://issues.apache.org/jira/browse/KAFKA-10105
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 2.4.1
> Environment: Kafka 2.4.1 on jre 11 on debian 9 in docker
> Reporter: William Reynolds
> Priority: Major
>
> Since upgrade of a cluster from 1.1.0 to 2.4.1 the broker no longer deals
> correctly with a consumer sending a join after a leave correctly.
> What happens no is that if a consumer sends a leaving then follows up by
> trying to send a join again as it is shutting down the group coordinator adds
> the leaving member to the group but never seems to heartbeat that member.
> Since the consumer is then gone when it joins again after starting it is
> added as a new member but the zombie member is there and is included in the
> partition assignment which means that those partitions never get consumed
> from. What can also happen is that one of the zombies gets group leader so
> rebalance gets stuck forever and the group is entirely blocked.
> I have not been able to track down where this got introduced between 1.1.0
> and 2.4.1 but I will look further into this. Unfortunately the logs are
> essentially silent about the zombie mebers and I only had INFO level logging
> on during the issue and by stopping all the consumers in the group and
> restarting the broker coordinating that group we could get back to a working
> state.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)