[ 
https://issues.apache.org/jira/browse/KAFKA-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236547#comment-17236547
 ] 

A. Sophie Blee-Goldman commented on KAFKA-10643:
------------------------------------------------

Hey [~eran-levy]

Thanks for uploading a larger sample of logs. I just had a look through the 
latest and think you may be hitting 
[KAFKA-10455|https://issues.apache.org/jira/browse/KAFKA-10455] , although I 
don't think that's necessarily related to the problem that you're experiencing. 
It only explains why there seem to be so many probing rebalances according to 
the logs, but these probing rebalances aren't actually occuring.

I'll try to dig for more clues but it's difficult to say with only the 
INFO-level logs available, since a long gap between logs may be due to healthy 
processing or due to Streams being stuck somewhere. If it would be possible to 
collect DEBUG logs that would definitely expedite the debugging process.

The only other thing I can think of is that maybe there's something going on 
with rocksdb, eg a very long compaction that's causing write stalls. I'd 
recommend collecting the rocksdb metrics if you aren't already, and looking 
into those for clues

> Static membership - repetitive PreparingRebalance with updating metadata for 
> member reason
> ------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-10643
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10643
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: Eran Levy
>            Priority: Major
>         Attachments: broker-4-11.csv, client-4-11.csv, 
> client-d-9-11-11-2020.csv
>
>
> Kafka streams 2.6.0, brokers version 2.6.0. Kafka nodes are healthy, kafka 
> streams app is healthy. 
> Configured with static membership. 
> Every 10 minutes (I assume cause of topic.metadata.refresh.interval.ms), I 
> see the following group coordinator log for different stream consumers: 
> INFO [GroupCoordinator 2]: Preparing to rebalance group **--**-stream in 
> state PreparingRebalance with old generation 12244 (__consumer_offsets-45) 
> (reason: Updating metadata for member 
> ****-stream-11-1-013edd56-ed93-4370-b07c-1c29fbe72c9a) 
> (kafka.coordinator.group.GroupCoordinator)
> and right after that the following log: 
> INFO [GroupCoordinator 2]: Assignment received from leader for group 
> **-**-stream for generation 12246 (kafka.coordinator.group.GroupCoordinator)
>  
> Looked a bit on the kafka code and Im not sure that I get why such a thing 
> happening - is this line described the situation that happens here re the 
> "reason:"?[https://github.com/apache/kafka/blob/7ca299b8c0f2f3256c40b694078e422350c20d19/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L311]
> I also dont see it happening too often in other kafka streams applications 
> that we have. 
> The only thing suspicious that I see around every hour that different pods of 
> that kafka streams application throw this exception: 
> {"timestamp":"2020-10-25T06:44:20.414Z","level":"INFO","thread":"**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1","logger":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer
>  
> clientId=**-**-stream-94561945-4191-4a07-ac1b-07b27e044402-StreamThread-1-restore-consumer,
>  groupId=null] Error sending fetch request (sessionId=34683236, epoch=2872) 
> to node 
> 3:","context":"default","exception":"org.apache.kafka.common.errors.DisconnectException:
>  null\n"}
> I came across this strange behaviour after stated to investigate a strange 
> stuck rebalancing state after one of the members left the group and caused 
> the rebalance to stuck - the only thing that I found is that maybe because 
> that too often preparing to rebalance states, the app might affected of this 
> bug - KAFKA-9752 ?
> I dont understand why it happens, it wasn't before I applied static 
> membership to that kafka streams application (since around 2 weeks ago). 
> Will be happy if you can help me
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to