Luke Chen created KAFKA-16132: --------------------------------- Summary: Upgrading from 3.6 to 3.7 in KRaft will have seconds of partitions unavailable Key: KAFKA-16132 URL: https://issues.apache.org/jira/browse/KAFKA-16132 Project: Kafka Issue Type: Bug Affects Versions: 3.7.0 Reporter: Luke Chen
When upgrading from 3.6 to 3.7, we noticed that after upgrade the metadata version, all the partitions will be reset at one time, which causes a short period of time unavailable. This doesn't happen before. {code:java} [2024-01-15 20:45:19,757] INFO [BrokerMetadataPublisher id=2] Updating metadata.version to 19 at offset OffsetAndEpoch(offset=229, epoch=2). (kafka.server.metadata.BrokerMetadataPublisher) [2024-01-15 20:45:29,915] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions Set(t1-29, t1-25, t1-21, t1-17, t1-46, t1-13, t1-42, t1-9, t1-38, t1-5, t1-34, t1-1, t1-30, t1-26, t1-22, t1-18, t1-47, t1-14, t1-43, t1-10, t1-39, t1-6, t1-35, t1-2, t1-31, t1-27, t1-23, t1-19, t1-48, t1-15, t1-44, t1-11, t1-40, t1-7, t1-36, t1-3, t1-32, t1-28, t1-24, t1-20, t1-49, t1-16, t1-45, t1-12, t1-41, t1-8, t1-37, t1-4, t1-33, t1-0) (kafka.server.ReplicaFetcherManager) {code} Complete log: https://gist.github.com/showuon/665aa3ce6afd59097a2662f8260ecc10 Steps: 1. start up a 3.6 kafka cluster in KRaft with 1 broker 2. create a topic 3. upgrade the binary to 3.7 4. use kafka-features.sh to upgrade to 3.7 metadata version 5. check the log (and metrics if interested) Analysis: In 3.7, we have JBOD support in KRaft, so the partitionRegistration added a new directory field. And it causes diff found while comparing delta. We might be able to identify this adding directory change doesn't need to reset the leader/follower state, and just update the metadata, to avoid causing unavailability. -- This message was sent by Atlassian Jira (v8.20.10#820010)