zzshine created KAFKA-19643:
-------------------------------
Summary: Controller keeps switching and occasionally goes offline.
Key: KAFKA-19643
URL: https://issues.apache.org/jira/browse/KAFKA-19643
Project: Kafka
Issue Type: Bug
Components: controller, kraft
Affects Versions: 3.9.1
Environment: CentOS Linux 7,kernel-release:4.19.325
Java 21
Reporter: zzshine
Inter-cluster communication is normal without packet loss, and the cluster is
properly configured.
The Kafka server continuously prints the following logs:
[2025-08-25 19:08:55,581] INFO [RaftManager id=1] Become candidate due to fetch
timeout (org.apache.kafka.raft.KafkaRaftClient)
[2025-08-25 19:08:55,686] INFO [RaftManager id=1] Disconnecting from node 2 due
to request timeout. (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:08:55,686] INFO [RaftManager id=1] Cancelled in-flight FETCH
request with correlation id 128927 due to node 2 being disconnected (elapsed
time since creation: 5147ms, elapsed time since send: 5146ms, throttle time:
0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1
name=heartbeat] Disconnecting from node 3 due to request timeout.
(org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,274] INFO [NodeToControllerChannelManager id=1
name=heartbeat] Cancelled in-flight BROKER_HEARTBEAT request with correlation
id 871 due to node 3 being disconnected (elapsed time since creation: 4004ms,
elapsed time since send: 4004ms, throttle time: 0ms, request timeout: 4000ms)
(org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,807] INFO [RaftManager id=1] Disconnecting from node 3 due
to request timeout. (org.apache.kafka.clients.NetworkClient)
[2025-08-25 19:09:33,807] INFO [RaftManager id=1] Cancelled in-flight FETCH
request with correlation id 128995 due to node 3 being disconnected (elapsed
time since creation: 5720ms, elapsed time since send: 5720ms, throttle time:
0ms, request timeout: 5000ms) (org.apache.kafka.clients.NetworkClient)
Kafka karft config is:
# default 2000
broker.heartbeat.interval.ms=4000
# default 9000
broker.session.timeout.ms=10000
# default 2000
controller.quorum.request.timeout.ms=5000
# default 1000
controller.quorum.election.timeout.ms=5000
# default 1000
controller.quorum.election.backoff.max.ms=3000
# default 2000
controller.quorum.fetch.timeout.ms=6000
--
This message was sent by Atlassian Jira
(v8.20.10#820010)