Colin McCabe created KAFKA-17793:
------------------------------------
Summary: Improve kcontroller robustness against long delays
Key: KAFKA-17793
URL: https://issues.apache.org/jira/browse/KAFKA-17793
Project: Kafka
Issue Type: Bug
Reporter: Colin McCabe
Assignee: Colin McCabe
As described in KIP-500, the Kafka controller monitors the liveness of each
broker in the cluster. It gathers this information from heartbeats sent from
the brokers themselves.
In some rare cases, the main controller thread may get blocked for several
seconds at a time. In the current code, this will result in the controller
being unable to update the last contact times for the brokers during this time.
This PR changes the controller heartbeat handling to be partially lockless.
Specifically, the last contact time for each broker will be updated locklessly
prior to the rest of the heartbeat handling. This will ensure that heartbeats
always get through.
Additionally, this PR adds a PeriodicTaskControlManager to better manage
periodic tasks. This should help handle the very common pattern where we want
to schedule a background task at some frequency. We also want the background
task to be immediately rescheduled if there is too much work to be done in one
event.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)