[ https://issues.apache.org/jira/browse/KAFKA-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237660#comment-17237660 ]
A. Sophie Blee-Goldman commented on KAFKA-10643: ------------------------------------------------ Sorry yeah I meant debug logs for the Streams app. That should help us narrow down whether the app is getting stuck somewhere or what might possibly be happening. I mean it's always possible that this is a broker-side bug, since the static membership feature spans both client and broker code. But I think it makes sense to start investigating the Streams app/logs since in my experience, these sorts of issues are typically client-side. FWIW I'm not personally an expert in static membership – not sure if [~boyang] might have any ideas on what could be causing this, if there are any known bugs in static membership that cause members to drop out, etc > Static membership - repetitive PreparingRebalance with updating metadata for > member reason > ------------------------------------------------------------------------------------------ > > Key: KAFKA-10643 > URL: https://issues.apache.org/jira/browse/KAFKA-10643 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.6.0 > Reporter: Eran Levy > Priority: Major > Attachments: broker-4-11.csv, client-4-11.csv, > client-d-9-11-11-2020.csv > > > Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka > streams app is healthy. > Configured with static membership. > Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I > see the following group coordinator log for different stream consumers: > INFO [GroupCoordinator 2]: Preparing to rebalance group **--**-stream in > state PreparingRebalance with old generation 12244 (__consumer_offsets-45) > (reason: Updating metadata for member > ****-stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) > (kafka.coordinator.group.GroupCoordinator) > and right after that the following log: > INFO [GroupCoordinator 2]: Assignment received from leader for group > **-**-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator) > > Looked a bit on the kafka code and Im not sure that I get why such a thing > happening - is this line described the situation that happens here re the > "reason:"?[https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311] > I also dont see it happening too often in other kafka streams applications > that we have. > The only thing suspicious that I see around every hour that different pods of > that kafka streams application throw this exception: > {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer > > clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer, > groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) > to node > 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException: > null\n"} > I came across this strange behaviour after stated to investigate a strange > stuck rebalancing state after one of the members left the group and caused > the rebalance to stuck - the only thing that I found is that maybe because > that too often preparing to rebalance states, the app might affected of this > bug - KAFKA-9752 ? > I dont understand why it happens, it wasn't before I applied static > membership to that kafka streams application (since around 2 weeks ago). > Will be happy if you can help me > > -- This message was sent by Atlassian Jira (v8.3.4#803005)