[ 
https://issues.apache.org/jira/browse/KAFKA-10127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310982#comment-17310982
 ] 

Ajay Patel commented on KAFKA-10127:
------------------------------------

You might find luck in 
[KAFKA-2729|https://issues.apache.org/jira/browse/KAFKA-2729] as that issue 
sounds identical and has some more traffic.

> kafka cluster not recovering - Shrinking ISR  continously
> ---------------------------------------------------------
>
>                 Key: KAFKA-10127
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10127
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication, zkclient
>    Affects Versions: 2.4.1
>         Environment: using kafka version 2.4.1 and zookeeper version 3.5.7
>            Reporter: Youssef BOUZAIENNE
>            Priority: Major
>
> We are actually facing issue from time to time where our kafka cluster goes 
> into a weird state. We see the following log repeating
> [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Cached zkVersion 
> 620 not equal to that in zookeeper, skip updating ISR 
> (kafka.cluster.Partition)
>  [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Shrinking ISR 
> from 1006,1002 to 1002. Leader: (highWatermark: 3222733572, endOffset: 
> 3222741893). Out of sync replicas: (brokerId: 1006, endOffset: 3222733572). 
> (kafka.cluster.Partition)
>  
> Just before that our zookeeper session expired which lead us to that state.
>  
> After we increased this two values below we encounter the issue less 
> frequently but it still appears from time to time and the only solution is 
> restart of kafka service on all brokers to recover.
> zookeeper.session.timeout.ms=18000
> replica.lag.time.max.ms=30000
>  
> Any thoughts on that please  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to