Hi, we have a kafka cluster which is made of 6 brokers, with 8 cpu and 16G memory on each broker’s machine, and we have about 1600 topics in the cluster,about 1700 partitions’ leader and 1600 partitions' replica on each broker. when we restart a normal broke, we find that there are 500+ partitions shrink and expand frequently when restart the broker, there are many logs as below.
[2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition) [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition) [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition) [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition) [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition) [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726: Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726 (kafka.cluster.Partition) [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726: Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750 (kafka.cluster.Partition) … and repeat shrink and expand after 30 minutes which is the default value of leader.imbalance.check.interval.seconds, and at that time we can find the log of controller’s auto rebalance,which can leads some partition’s leader change to this restarted broker. we have no shrink and expand when our cluster is running except when we restart it,so replica.fetch.thread.num is 1,and it seems enough. we can reproduce it at each restart,can someone give some suggestions. thanks before.