Paolo Patierno created KAFKA-16005:
--------------------------------------
Summary: ZooKeeper to KRaft migration rollback missing disabling
controller and migration configuration on brokers
Key: KAFKA-16005
URL: https://issues.apache.org/jira/browse/KAFKA-16005
Project: Kafka
Issue Type: Bug
Components: documentation
Affects Versions: 3.6.1
Reporter: Paolo Patierno
I was following the latest documentation additions to try the rollback process
of a ZK cluster migrating to KRaft, while it's still in dual-write mode:
[https://github.com/apache/kafka/pull/14160/files#diff-e4e8d893dc2a4e999c96713dd5b5857203e0756860df0e70fb0cb041aa4d347bR3786]
The first point is just about stopping broker, deleting __cluster_metadata
folder and restarting broker.
I think it's missing at least the following steps:
* removing/disabling the ZooKeeper migration flag
* removing all properties related to controllers configuration (i.e.
controller.quorum.voters, controller.listener.names, ...)
Without those steps, when the broker restarts, we have got broker re-creating
the __cluster_metadata folder (because it syncs with controllers while they are
still running).
Also, when controllers stops, the broker starts to raise exceptions like this:
{code:java}
[2023-12-13 15:22:28,437] DEBUG [BrokerToControllerChannelManager id=0
name=quorum] Connection with localhost/127.0.0.1 (channelId=1) disconnected
(org.apache.kafka.common.network.Selector)java.net.ConnectException: Connection
refused at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native
Method) at
java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at
org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:50)
at
org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:224)
at
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:526)
at org.apache.kafka.common.network.Selector.poll(Selector.java:481) at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:571) at
org.apache.kafka.server.util.InterBrokerSendThread.pollOnce(InterBrokerSendThread.java:109)
at
kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManager.scala:421)
at
org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:130)[2023-12-13
15:22:28,438] INFO [BrokerToControllerChannelManager id=0 name=quorum] Node 1
disconnected. (org.apache.kafka.clients.NetworkClient)[2023-12-13 15:22:28,438]
WARN [BrokerToControllerChannelManager id=0 name=quorum] Connection to node 1
(localhost/127.0.0.1:9093) could not be established. Broker may not be
available. (org.apache.kafka.clients.NetworkClient) {code}
(where I have controller locally on port 9093)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)