Juha Mynttinen created KAFKA-17752:
--------------------------------------
Summary: Contoller crashes when removed if it is an initial
controller
Key: KAFKA-17752
URL: https://issues.apache.org/jira/browse/KAFKA-17752
Project: Kafka
Issue Type: Bug
Affects Versions: 3.9.0
Reporter: Juha Mynttinen
Hey,
Tested using 3.9.0 RC0.
It seems that "kafka-metadata-quorum.sh remove-controller" causes the removed
controller to crash if it is one of the controllers specified using
"--initial-controllers "
Steps to reproduce:
Clean up and setup the environment
rm -rf /tmp/controllers && \
mkdir -p /tmp/controllers/c1 && \
mkdir -p /tmp/controllers/c2 && \
mkdir -p /tmp/controllers/c3
export KAFKA_HOME=<your_kafka_3_9_home>
Format the controllers
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c1.properties
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c2.properties
$KAFKA_HOME/bin/kafka-storage.sh format --cluster-id
00000000-0000-0000-0000-000000000001 --initial-controllers
1001@localhost:10001:AAAAAAAAAAEAAAAAAAAAAA,1002@localhost:10002:AAAAAAAAAAEAAAAAAAAAAA,1003@localhost:10003:AAAAAAAAAAEAAAAAAAAAAA
--config c3.properties
Start the controllers, in separate terminals
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c1.properties
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c2.properties
$KAFKA_HOME/bin/kafka-run-class.sh -name kafkaService kafka.Kafka c3.properties
Remove a controller:
$KAFKA_HOME/bin/kafka-metadata-quorum.sh --bootstrap-controller
localhost:10001,localhost:10002,localhost:10003,localhost:10004
remove-controller --controller-id 1001 --controller-directory-id
AAAAAAAAAAEAAAAAAAAAAA
The process crashes with the following error:
[2024-10-09 15:19:15,574] ERROR Encountered fatal fault: exception while
renouncing leadership
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
java.lang.RuntimeException: Unable to reset to last stable offset 55. No
in-memory snapshot found for this offset.
at
org.apache.kafka.controller.OffsetControlManager.deactivate(OffsetControlManager.java:268)
at
org.apache.kafka.controller.QuorumController.renounce(QuorumController.java:1281)
at
org.apache.kafka.controller.QuorumController.handleEventException(QuorumController.java:552)
at
org.apache.kafka.controller.QuorumController.access$800(QuorumController.java:180)
at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:885)
at
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:875)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:153)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:142)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:215)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:186)
at java.base/java.lang.Thread.run(Thread.java:840)
If the process that died is restarted it joins the cluster and becomes on
observer, as expected.
The crash doesn't happen in a slightly different case, exact steps missing. But
the idea is this:
1. Create a 3-controller cluster as above
2. Format and start a 4rd controller.
3. Add the 4th controller as a voter.
4. Remove the 4th controller to make it an observer. It becomes observer as
expected.
Because this case works, I'm guessing the crash is somehow related to the
controller being one of the initial controllers.
I didn't dig deeper on why the crash occurs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)