[ 
https://issues.apache.org/jira/browse/KAFKA-10105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Reynolds updated KAFKA-10105:
-------------------------------------
    Environment: Kafka 2.4.1 on jre 11 on debian 9 in docker  (was: Kafka 1.1.0 
on jre 8 on debian 9 in docker
Kafka 2.4.1 on jre 11 on debian 9 in docker)

> Regression in group coordinator dealing with flaky clients joining while 
> leaving
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-10105
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10105
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.4.1
>         Environment: Kafka 2.4.1 on jre 11 on debian 9 in docker
>            Reporter: William Reynolds
>            Priority: Major
>
> Since upgrade of a cluster from 1.1.0 to 2.4.1 the broker no longer deals 
> correctly with a consumer sending a join after a leave correctly.
> What happens no is that if a consumer sends a leaving then follows up by 
> trying to send a join again as it is shutting down the group coordinator adds 
> the leaving member to the group but never seems to heartbeat that member.
> Since the consumer is then gone when it joins again after starting it is 
> added as a new member but the zombie member is there and is included in the 
> partition assignment which means that those partitions never get consumed 
> from. What can also happen is that one of the zombies gets group leader so 
> rebalance gets stuck forever and the group is entirely blocked.
> I have not been able to track down where this got introduced between 1.1.0 
> and 2.4.1 but I will look further into this. Unfortunately the logs are 
> essentially silent about the zombie mebers and I only had INFO level logging 
> on during the issue and by stopping all the consumers in the group and 
> restarting the broker coordinating that group we could get back to a working 
> state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to