[ 
https://issues.apache.org/jira/browse/KAFKA-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411926#comment-17411926
 ] 

John Gray edited comment on KAFKA-10643 at 9/8/21, 1:23 PM:
------------------------------------------------------------

We were having this same issue with our new static consumers once their 
changelog topics got large enough. The group would never stabilize because of 
these looping metadata updates. We ended up stabilizing our groups by 
increasing max.poll.record.ms and metadata.max.age.ms in our streams apps to 
longer than however long we expected our restore consumer to take restoring our 
large stores. 30 minutes ended up working for us. I am not sure if it is 
expected that a metadata update should trigger a rebalance for a static 
consumer group with lots of restoring threads, but it certainly sent our groups 
with large state into a frenzy. It has been a while so you may have moved on 
from this, but I would be curious to see if these configs help your group, 
[~maatdeamon].


was (Author: gray.john):
We were having this same issue with our new static consumers once their 
changelog topics got large enough. The group would never stabilize because of 
these looping metadata updates. We ended up stabilizing our groups by 
increasing max.poll.record.ms and metadata.max.age.ms in our streams apps to 
longer than however long we expected our restore consumer to take restoring our 
large stores. 30 minutes ended up working for us. I am not sure if this is 
expected that a metadata update should trigger a rebalance for a static 
consumer group with lots of restoring threads, but it certainly sent our groups 
with large state into a frenzy. It has been a while so you may have moved on 
from this, but I would be curious to see if these configs help your group, 
[~maatdeamon].

> Static membership - repetitive PreparingRebalance with updating metadata for 
> member reason
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10643
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10643
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: Eran Levy
>            Priority: Major
>         Attachments: broker-4-11.csv, client-4-11.csv, 
> client-d-9-11-11-2020.csv
>
>
> Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka 
> streams app is healthy. 
> Configured with static membership. 
> Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I 
> see the following group coordinator log for different stream consumers: 
> INFO [GroupCoordinator 2]: Preparing to rebalance group **--**-stream in 
> state PreparingRebalance with old generation 12244 (__consumer_offsets-45) 
> (reason: Updating metadata for member 
> ****-stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) 
> (kafka.coordinator.group.GroupCoordinator)
> and right after that the following log: 
> INFO [GroupCoordinator 2]: Assignment received from leader for group 
> **-**-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator)
>  
> Looked a bit on the kafka code and Im not sure that I get why such a thing 
> happening - is this line described the situation that happens here re the 
> "reason:"?[https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311]
> I also dont see it happening too often in other kafka streams applications 
> that we have. 
> The only thing suspicious that I see around every hour that different pods of 
> that kafka streams application throw this exception: 
> {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer
>  
> clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer,
>  groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) 
> to node 
> 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException:
>  null\n"}
> I came across this strange behaviour after stated to investigate a strange 
> stuck rebalancing state after one of the members left the group and caused 
> the rebalance to stuck - the only thing that I found is that maybe because 
> that too often preparing to rebalance states, the app might affected of this 
> bug - KAFKA-9752 ?
> I dont understand why it happens, it wasn't before I applied static 
> membership to that kafka streams application (since around 2 weeks ago). 
> Will be happy if you can help me
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to