mumrah opened a new pull request, #15918: URL: https://github.com/apache/kafka/pull/15918
When becoming the active KRaftMigrationDriver, there is another race condition similar to KAFKA-16171. This time, the race is due to a stale read from ZK. After writing to `/controller` and `/controller_epoch`, it is possible that a read on `/migration` is not linear with the writes that were just made. In other words, we get a stale read on `/migration`. This leads to an inability to sync metadata to ZK due to incorrect zkVersion on the migration Znode. The non-linearizability of reads is in fact documented behavior for ZK, so we need to handle it. To fix the stale read, this patch adds a write to `/migration` after updating `/controller` and `/controller_epoch`. This allows us to learn the correct zkVersion for the migration ZNode before leaving the BECOME_CONTROLLER state. This patch also adds a check on the current leader epoch when running certain events in KRaftMigrationDriver. Historically, we did not include this check because it is not necessary for correctness. Writes to ZK are gated on the `/controller_epoch` zkVersion, and RPCs sent to brokers are gated on the controller epoch. However, during a time of rapid failover, there is a lot of processing happening on the controller (i.e., full metadata sync to ZK and full UMRs sent to brokers), so it is best to avoid running events we know will fail. There is also a small fix in here to improve the logging of ZK operations. The log message are changed to past tense to reflect the fact that they have already happened by the time the log message is created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org