Hi,
we have a kafka cluster which is made of 6 brokers, with 8 cpu and 16G
memory on each broker’s machine, and we have about 1600 topics in the
cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
broker.
when we restart a normal broke, we find that there are 500+ partitions
shrink and expand frequently when restart the broker,
there are many logs as below.
[2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
(kafka.cluster.Partition)
[2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726: Shrinking
ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
(kafka.cluster.Partition)
[2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726: Expanding
ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
(kafka.cluster.Partition)
[2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726: Shrinking
ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
(kafka.cluster.Partition)
[2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726: Expanding
ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
(kafka.cluster.Partition)
[2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726: Shrinking
ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
(kafka.cluster.Partition)
[2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726: Expanding
ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
(kafka.cluster.Partition)
…
and repeat shrink and expand after 30 minutes which is the default value of
leader.imbalance.check.interval.seconds, and at that time
we can find the log of controller’s auto rebalance,which can leads some
partition’s leader change to this restarted broker.
we have no shrink and expand when our cluster is running except when we
restart it,so replica.fetch.thread.num is 1,and it seems enough.
we can reproduce it at each restart,can someone give some suggestions.
thanks before.