[
https://issues.apache.org/jira/browse/KAFKA-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932938#comment-17932938
]
Luke Chen commented on KAFKA-18930:
-----------------------------------
[~davidarthur] [~mumrah] , I'd like to hear your thought on this issue. Thanks.
> KRaft MigrationEvent won't retry when failing to write data to ZK
> ------------------------------------------------------------------
>
> Key: KAFKA-18930
> URL: https://issues.apache.org/jira/browse/KAFKA-18930
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.9.0
> Reporter: Luke Chen
> Priority: Major
>
> When running ZK migrating to KRaft, there will be a dual-write mode. In that
> mode, metadata will write to KRaft, then write to ZK asynchronously. When
> there's some exception, KRaft MigrationEvent won't retry when failing to
> write data to ZK. That causes metadata inconsistency between KRaft and ZK.
>
> Note:
> 1. Besides, when doing KRaft controller clean shutdown, we should keep
> retrying the failing ZK writing until force shutdown, to make sure the
> metadata is consistent.
> 2. When doing shutdown, [the order of
> shutdown|https://github.com/apache/kafka/blob/1ec1043d5197c4f807fa5cbc41d875b289443096/core/src/main/scala/kafka/server/ControllerServer.scala#L69-L76]
> is to close ZK -> close RPC Client -> close migration driver. That causes
> another issue that even if we retry the ZK write, it will never succeed when
> shutdown is ongoing because ZK connection is closed first.
>
> The impact is when rolling back to ZK mode during migration, the metadata in
> ZK is out of date
--
This message was sent by Atlassian Jira
(v8.20.10#820010)