[ https://issues.apache.org/jira/browse/KAFKA-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Chen resolved KAFKA-16101. ------------------------------- Resolution: Fixed > KRaft migration rollback documentation is incorrect > --------------------------------------------------- > > Key: KAFKA-16101 > URL: https://issues.apache.org/jira/browse/KAFKA-16101 > Project: Kafka > Issue Type: Bug > Components: kraft > Affects Versions: 3.6.1 > Reporter: Paolo Patierno > Assignee: Colin McCabe > Priority: Blocker > Fix For: 3.7.0 > > > Hello, > I was trying the KRaft migration rollback procedure locally and I came across > a potential bug or anyway a situation where the cluster is not > usable/available for a certain amount of time. > In order to test the procedure, I start with a one broker (broker ID = 0) and > one zookeeper node cluster. Then I start the migration with a one KRaft > controller node (broker ID = 1). The migration runs fine and it reaches the > point of "dual write" state. > From this point, I try to run the rollback procedure as described in the > documentation. > As first step, this involves ... > * stopping the broker > * removing the __cluster_metadata folder > * removing ZooKeeper migration flag and controller(s) related configuration > from the broker > * restarting the broker > With the above steps done, the broker starts in ZooKeeper mode (no migration, > no KRaft controllers knowledge) and it keeps logging the following messages > in DEBUG: > {code:java} > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,608] DEBUG > [zk-broker-0-to-controller-forwarding-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't > cached, looking for local metadata changes > (kafka.server.BrokerToControllerRequestThread) > [2024-01-08 11:51:20,629] DEBUG > [zk-broker-0-to-controller-alter-partition-channel-manager]: No controller > provided, retrying after backoff > (kafka.server.BrokerToControllerRequestThread) {code} > What's happening should be clear. > The /controller znode in ZooKeeper still reports the KRaft controller (broker > ID = 1) as controller. The broker gets it from the znode but doesn't know how > to reach it. > The issue is that until the procedure isn't fully completed with the next > steps (shutting down KRaft controller, deleting /controller znode), the > cluster is unusable. Any admin or client operation against the broker doesn't > work, just hangs, the broker doesn't reply. > Imagining this scenario to a more complex one with 10-20-50 brokers and > partitions' replicas spread across them, when the brokers are rolled one by > one (in ZK mode) reporting the above error, the topics will become not > available one after the other, until all brokers are in such a state and > nothing can work. This is because from a KRaft controller perspective (still > running), the brokers are not available anymore and the partitions' replicas > are out of sync. > Of course, as soon as you complete the rollback procedure, after deleting the > /controller znode, the brokers are able to elect a new controller among them > and everything recovers to work. > My first question ... isn't the cluster supposed to work during rollback and > being always available during the rollback when the procedure is not > completed yet? Or having the cluster not available is an assumption during > the rollback, until it's fully completed? > This "unavailability" time window could be reduced by deleting the > /controller znode before shutting down the KRaft controllers to allow the > brokers electing a new controller among them, but in this case, could there > be a race condition where KRaft controllers still running could steal > leadership again? > Or is there anything missing in the documentation maybe which is driving to > this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010)