ppatierno commented on code in PR #15193: URL: https://github.com/apache/kafka/pull/15193#discussion_r1458520337
########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process + to follow depends on how far the migration has progressed. In order to find out how to revert, + select the <b>final</b> migration step that you have <b>completed</b> in this table. + </p> + <p> + Note that the directions given here assume that each step was fully completed, and they were + done in order. So, for example, we assume that if "Enabling the migration on the brokers" was completed, Review Comment: Referring to "Enabling the migration on the brokers", reinforce the comment I made above. This is a better title to refer to, so with my previous comment I would leave as it is. ########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process Review Comment: ```suggestion While the cluster is still in migration mode, it is possible to revert to ZooKeeper mode. The process ``` ########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process + to follow depends on how far the migration has progressed. In order to find out how to revert, + select the <b>final</b> migration step that you have <b>completed</b> in this table. + </p> + <p> + Note that the directions given here assume that each step was fully completed, and they were + done in order. So, for example, we assume that if "Enabling the migration on the brokers" was completed, + "Provisioning the KRaft controller quorum" was also fully completed previously. + </p> + <p> + If you did not fully complete any step, back out whatever you have done and then follow revert + directions for the last fully completed step. + </p> + + <table class="data-table"> + <tbody> + <tr> + <th>Final Migration Section Completed</th> + <th>Directions for Reverting</th> + <th>Notes</th> + </tr> + <tr> + <td>Preparing for migration</td> + <td>The prepartion section does not involve leaving ZK mode. So there is nothing to do in + the case of a revert.</td> + <td></td> + </tr> + <tr> + <td>Provisioning the KRaft controller quorum</td> + <td>Deprovision the KRaft controller quorum, and then you are done.</td> + <td></td> + </tr> + <tr> + <td>Enabling zookeeper.metadata.migration.enable on the brokers</td> Review Comment: Again I would leave the previous title. ########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process + to follow depends on how far the migration has progressed. In order to find out how to revert, + select the <b>final</b> migration step that you have <b>completed</b> in this table. + </p> + <p> + Note that the directions given here assume that each step was fully completed, and they were + done in order. So, for example, we assume that if "Enabling the migration on the brokers" was completed, + "Provisioning the KRaft controller quorum" was also fully completed previously. + </p> + <p> + If you did not fully complete any step, back out whatever you have done and then follow revert + directions for the last fully completed step. + </p> + + <table class="data-table"> + <tbody> + <tr> + <th>Final Migration Section Completed</th> + <th>Directions for Reverting</th> + <th>Notes</th> + </tr> + <tr> + <td>Preparing for migration</td> + <td>The prepartion section does not involve leaving ZK mode. So there is nothing to do in + the case of a revert.</td> + <td></td> + </tr> + <tr> + <td>Provisioning the KRaft controller quorum</td> + <td>Deprovision the KRaft controller quorum, and then you are done.</td> + <td></td> + </tr> + <tr> + <td>Enabling zookeeper.metadata.migration.enable on the brokers</td> + <td>Roll the broker cluster with zookeeper.metadata.migration.enable=false. Then, + deprovision the KRaft controller quorum. Then you are done.</td> + <td></td> + </tr> + <tr> + <td>Migrating brokers to KRaft</td> + <td><p> + Roll the broker cluster with the process.roles configuration omitted, node.id + replaced with broker.id, and the zookeeper.connect configuration set to a valid + value. Review Comment: In my tests you also need to add `zookeeper.metadata.migration.enable=true` again. Without setting it, the broker fails with the following exception: ```shell [2024-01-19 09:15:45,246] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$) java.lang.IllegalArgumentException: requirement failed: controller.listener.names must be empty when not running in KRaft mode: [CONTROLLER] at scala.Predef$.require(Predef.scala:281) at kafka.server.KafkaConfig.validateValues(KafkaConfig.scala:2380) at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:2233) at kafka.server.KafkaConfig.<init>(KafkaConfig.scala:1604) at kafka.server.KafkaConfig$.fromProps(KafkaConfig.scala:1527) at kafka.Kafka$.buildServer(Kafka.scala:72) at kafka.Kafka$.main(Kafka.scala:91) at kafka.Kafka.main(Kafka.scala) ``` ########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process + to follow depends on how far the migration has progressed. In order to find out how to revert, + select the <b>final</b> migration step that you have <b>completed</b> in this table. + </p> + <p> + Note that the directions given here assume that each step was fully completed, and they were + done in order. So, for example, we assume that if "Enabling the migration on the brokers" was completed, + "Provisioning the KRaft controller quorum" was also fully completed previously. + </p> + <p> + If you did not fully complete any step, back out whatever you have done and then follow revert + directions for the last fully completed step. + </p> + + <table class="data-table"> + <tbody> + <tr> + <th>Final Migration Section Completed</th> + <th>Directions for Reverting</th> + <th>Notes</th> + </tr> + <tr> + <td>Preparing for migration</td> + <td>The prepartion section does not involve leaving ZK mode. So there is nothing to do in + the case of a revert.</td> + <td></td> + </tr> + <tr> + <td>Provisioning the KRaft controller quorum</td> + <td>Deprovision the KRaft controller quorum, and then you are done.</td> + <td></td> + </tr> + <tr> + <td>Enabling zookeeper.metadata.migration.enable on the brokers</td> + <td>Roll the broker cluster with zookeeper.metadata.migration.enable=false. Then, + deprovision the KRaft controller quorum. Then you are done.</td> + <td></td> + </tr> + <tr> + <td>Migrating brokers to KRaft</td> + <td><p> + Roll the broker cluster with the process.roles configuration omitted, node.id + replaced with broker.id, and the zookeeper.connect configuration set to a valid + value. + </p> + <p> + After this roll is fully complete, perform a second roll where you set + zookeeper.metadata.migration.enable=false. Then, + deprovision the KRaft controller quorum. Then you are done. + </p> + </td> + <td> + Make sure that on the first roll, zookeeper.metadata.migration.enable remains set to + true. <b>Do not set it to false until the second roll.</b> Review Comment: I think we can remove this sentence if we take the comment I made above (the one about having zookeeper.metadata.migration.enable=true otherwise you get an exception on broker restart). Also, imagine a user following this steps one after the other, when they reach this point, it's too late to get the note. ########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process + to follow depends on how far the migration has progressed. In order to find out how to revert, + select the <b>final</b> migration step that you have <b>completed</b> in this table. + </p> + <p> + Note that the directions given here assume that each step was fully completed, and they were + done in order. So, for example, we assume that if "Enabling the migration on the brokers" was completed, + "Provisioning the KRaft controller quorum" was also fully completed previously. + </p> + <p> + If you did not fully complete any step, back out whatever you have done and then follow revert + directions for the last fully completed step. + </p> + + <table class="data-table"> + <tbody> + <tr> + <th>Final Migration Section Completed</th> + <th>Directions for Reverting</th> + <th>Notes</th> + </tr> + <tr> + <td>Preparing for migration</td> + <td>The prepartion section does not involve leaving ZK mode. So there is nothing to do in Review Comment: ```suggestion <td>The prepartion section does not involve leaving ZooKeeper mode. So there is nothing to do in ``` ########## docs/ops.html: ########## @@ -3992,6 +3979,75 @@ <h3>Finalizing the migration</h3> # Other configs ...</pre> + <h3>Reverting to ZooKeeper mode During the Migration</h3> + <p> + While the cluster is still in migration mode, it is possible to revert to ZK mode. The process + to follow depends on how far the migration has progressed. In order to find out how to revert, + select the <b>final</b> migration step that you have <b>completed</b> in this table. + </p> + <p> + Note that the directions given here assume that each step was fully completed, and they were + done in order. So, for example, we assume that if "Enabling the migration on the brokers" was completed, + "Provisioning the KRaft controller quorum" was also fully completed previously. + </p> + <p> + If you did not fully complete any step, back out whatever you have done and then follow revert + directions for the last fully completed step. + </p> + + <table class="data-table"> + <tbody> + <tr> + <th>Final Migration Section Completed</th> + <th>Directions for Reverting</th> + <th>Notes</th> + </tr> + <tr> + <td>Preparing for migration</td> + <td>The prepartion section does not involve leaving ZK mode. So there is nothing to do in + the case of a revert.</td> + <td></td> + </tr> + <tr> + <td>Provisioning the KRaft controller quorum</td> + <td>Deprovision the KRaft controller quorum, and then you are done.</td> + <td></td> + </tr> + <tr> + <td>Enabling zookeeper.metadata.migration.enable on the brokers</td> + <td>Roll the broker cluster with zookeeper.metadata.migration.enable=false. Then, + deprovision the KRaft controller quorum. Then you are done.</td> + <td></td> + </tr> + <tr> + <td>Migrating brokers to KRaft</td> + <td><p> + Roll the broker cluster with the process.roles configuration omitted, node.id + replaced with broker.id, and the zookeeper.connect configuration set to a valid + value. + </p> + <p> + After this roll is fully complete, perform a second roll where you set + zookeeper.metadata.migration.enable=false. Then, + deprovision the KRaft controller quorum. Then you are done. Review Comment: you should also remove the controller.quorum.voters and controller.listener.names from the broker configuration to come back being fully ZooKeeper again. But yeah doing just what the doc says, this is what I have from the broker restarting: ```shell [2024-01-19 09:21:31,690] DEBUG [zk-broker-0-to-controller-forwarding-channel-manager]: No controller provided, retrying after backoff (kafka.server.BrokerToControllerRequestThread) [2024-01-19 09:21:31,763] DEBUG [zk-broker-0-to-controller-alter-partition-channel-manager]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread) ``` This is also because the /controller znode is still pointing to a KRaft controller. I would say that after the first roll, the correct sequence should be: * Shutdown controllers * Remove /controller znode from ZooKeeper (and with controllers not running they cannot steal the leadership from brokers) * Remove ZooKeeper migration flag and controllers configuration from brokers and roll the brokers, they will elect a new leader among them. The above make my rollback process fully working. ########## docs/ops.html: ########## @@ -3870,7 +3870,7 @@ <h3>Provisioning the KRaft controller quorum</h3> <p><em>Note: The KRaft cluster <code>node.id</code> values must be different from any existing ZK broker <code>broker.id</code>. In KRaft-mode, the brokers and controllers share the same Node ID namespace.</em></p> - <h3>Enabling the migration on the brokers</h3> + <h3>Enabling zookeeper.metadata.migration.enable on the brokers</h3> Review Comment: Tbh I liked the previous title more. The need to set `zookeeper.metadata.migration.enable=true` is clearly explained in the section, why putting it into the title? It makes the title a little bit weird imho. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org