Dear kafka experts, I've got a kafka cluster with 3 brokers running in docker-containers on different hosts in version 2.1.1 of kafka. The cluster is serving some kafka streams apps. The topics are configured with replication.factor 3 and min.insync.replicas 2. The cluster works fine most of the time.
But just now, I encountered strange behavior. One broker (id 1) was suddenly out of the blue starting to log ISR Shrinks for for some partitions (not all) it was leading: "INFO [Partition __transaction_state-42 broker=1] Shrinking ISR from 1,3,2 to 1 (kafka.cluster.Partition)" About 2 seconds later it started to expand these ISR again while some other partitions where still shrinking. Three minutes later, this scenario repeated itself. I could not find anything other that was out of the ordinary neither in the logs of the kafka brokers nor in the metrics (network, CPU, RAM, disk,...). This leads to the question why the two brokers fell out of the ISR or the other broker at least had the impression the other two did? Has anyone experienced similar behavior? Or is this expected an can happen from time to time? I will definitely set up retries and retry.backoff.ms config the avoid the streams applications going down because of a small hunch like this in the future. Thanks for your input. Best, Claudia