[
https://issues.apache.org/jira/browse/KAFKA-17966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17897799#comment-17897799
]
Luke Chen commented on KAFKA-17966:
-----------------------------------
[~jsancio] [~cmccabe] , any thought about it?
> Controller replacement does not support scaling up before scaling down
> ----------------------------------------------------------------------
>
> Key: KAFKA-17966
> URL: https://issues.apache.org/jira/browse/KAFKA-17966
> Project: Kafka
> Issue Type: Bug
> Components: kraft
> Affects Versions: 3.9.0
> Reporter: Federico Valeri
> Priority: Major
>
> In KRaft, complex quorum changes are implemented as a series of
> single-controller changes. In this case, it is preferable to add controllers
> before removing controllers. For example, to replace a controller in a
> three-controller cluster, adding one controller and then removing the other
> allows the system to handle one controller failure at all times throughout
> the whole process. This is currently not possible, as it leads to
> DuplicateVoterException, so you are forced to do a scale down, followed by a
> scale up.
> Example:
> The operator can replace a failed disk with a new one. The replaced disk
> needs to be formatted with a new directory ID.
> {code}
> $ CLUSTER_ID="$(bin/kafka-cluster.sh cluster-id --bootstrap-server
> localhost:9092 | awk -F': ' '{print $2}')"
> $ bin/kafka-storage.sh format \
> --config /opt/kafka/server2/config/server.properties \
> --cluster-id "$CLUSTER_ID" \
> --no-initial-controllers \
> --ignore-formatted
> Formatting metadata directory /opt/kafka/server2/metadata with
> metadata.version 3.9-IV0.
> {code}
> After restarting the controller, the quorum will have two nodes with ID two:
> the original incarnation with a failed disk and an ever growing lag and
> follower status, plus a new one with a different directory ID and observer
> status.
> {code}
> $ bin/kafka-metadata-quorum.sh --bootstrap-controller localhost:8000 describe
> --re --hu
> NodeId DirectoryId LogEndOffset Lag
> LastFetchTimestamp LastCaughtUpTimestamp Status
> 0 pbvuBlaTTwKRxS5NLJwRFQ 535 0 6 ms ago
> 6 ms ago Leader
> 1 QjRpFtVDTtCa8OLXiSbmmA 535 0 283 ms ago
> 283 ms ago Follower
> 2 slcsM5ZAR0SMIF_u__MAeg 407 128 63307 ms ago
> 63802 ms ago Follower
> 2 wrqMDI1WDsqaooVSOtlgYw 535 0 281 ms ago
> 281 ms ago Observer
> 8 aXLz3ixjqzXhCYqKHRD4WQ 535 0 284 ms ago
> 284 ms ago Observer
> 7 KCriHQZm3TlxvEVNgyWKJw 535 0 284 ms ago
> 284 ms ago Observer
> 9 v5nnIwK8r0XqjyqlIPW-aw 535 0 284 ms ago
> 284 ms ago Observer
> {code}
> Once the new controller is in sync with the leader, we try to do a scale up.
> {code}
> $ bin/kafka-metadata-quorum.sh \
> --bootstrap-controller localhost:8000 \
> --command-config /opt/kafka/server2/config/server.properties \
> add-controller
> org.apache.kafka.common.errors.DuplicateVoterException: The voter id for
> ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already
> part of the set of voters [ReplicaKey(id=0,
> directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1,
> directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2,
> directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
> java.util.concurrent.ExecutionException:
> org.apache.kafka.common.errors.DuplicateVoterException: The voter id for
> ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is already
> part of the set of voters [ReplicaKey(id=0,
> directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1,
> directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2,
> directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
> at
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
> at
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
> at
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
> at
> org.apache.kafka.tools.MetadataQuorumCommand.handleAddController(MetadataQuorumCommand.java:431)
> at
> org.apache.kafka.tools.MetadataQuorumCommand.execute(MetadataQuorumCommand.java:147)
> at
> org.apache.kafka.tools.MetadataQuorumCommand.mainNoExit(MetadataQuorumCommand.java:81)
> at
> org.apache.kafka.tools.MetadataQuorumCommand.main(MetadataQuorumCommand.java:76)
> Caused by: org.apache.kafka.common.errors.DuplicateVoterException: The voter
> id for ReplicaKey(id=2, directoryId=Optional[u7e_mCmg0VAIz0zuAOcraA]) is
> already part of the set of voters [ReplicaKey(id=0,
> directoryId=Optional[PbEthh6mR8iVNizvUTUVFw]), ReplicaKey(id=1,
> directoryId=Optional[kIpbbU79QaCIIiOLOyCjJg]), ReplicaKey(id=2,
> directoryId=Optional[2ab0gajpS5aUf5d-2Jw02w])].
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)