Hi.

When a broker crashes and restarts after a long time (say 30 minutes),
the broker has to do some work to sync all its replicas with the
leaders. If the broker is a preferred replica for some partition, the
broker may become the leader as part of a preferred replica leader
election, while it is still catching up on some other partitions.


This scenario can lead to a high incoming throughput on the broker
during the catch up phase and cause back pressure with certain storage
volumes (which have a fixed max throughput). This backpressure can
slow down recovery time, and manifest in the form of client
application errors in producing / consuming data on / from the
recovering broker.


I am looking for solutions to mitigate this problem. There are two
solutions that I am aware of.


1. Scale out the cluster to have more brokers, so that the replication
traffic is smaller per broker during recovery.


2. Keep preferred replica leader elections disabled and manually run
one instance of preferred replica leader election after the broker has
come back up and fully caught up.


Are there other solutions?

Reply via email to