Dear kafka experts,

I've got a kafka cluster with 3 brokers running in docker-containers on 
different hosts in version 2.1.1 of kafka. The cluster is serving some kafka 
streams apps. The topics are configured with replication.factor 3 and 
min.insync.replicas 2.  The cluster works fine most of the time. 

But just now, I encountered strange behavior. One broker (id 1) was suddenly 
out of the blue starting to log ISR Shrinks for  for some partitions (not all)  
it was leading:
"INFO [Partition __transaction_state-42 broker=1] Shrinking ISR from 1,3,2 to 1 
(kafka.cluster.Partition)"

About 2 seconds later it started to expand these ISR again while some other 
partitions where still shrinking.
Three minutes later, this scenario repeated itself. 

I could not find anything other that was out of the ordinary neither in the 
logs of the kafka brokers nor in the metrics (network, CPU, RAM, disk,...).
This leads to the question why the two brokers fell out of the ISR or the other 
broker at least had the impression the other two did? Has anyone experienced 
similar behavior? Or is this expected an can happen from time to time?

I will definitely set up retries and retry.backoff.ms config the avoid the 
streams applications going down because of a small hunch like this in the 
future.

Thanks for your input.
Best,
Claudia

Reply via email to