Luke Chen created KAFKA-16132:
---------------------------------

             Summary: Upgrading from 3.6 to 3.7 in KRaft will have seconds of 
partitions unavailable
                 Key: KAFKA-16132
                 URL: https://issues.apache.org/jira/browse/KAFKA-16132
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 3.7.0
            Reporter: Luke Chen


When upgrading from 3.6 to 3.7, we noticed that after upgrade the metadata 
version, all the partitions will be reset at one time, which causes a short 
period of time unavailable. This doesn't happen before. 


{code:java}
[2024-01-15 20:45:19,757] INFO [BrokerMetadataPublisher id=2] Updating 
metadata.version to 19 at offset OffsetAndEpoch(offset=229, epoch=2). 
(kafka.server.metadata.BrokerMetadataPublisher)
[2024-01-15 20:45:29,915] INFO [ReplicaFetcherManager on broker 2] Removed 
fetcher for partitions Set(t1-29, t1-25, t1-21, t1-17, t1-46, t1-13, t1-42, 
t1-9, t1-38, t1-5, t1-34, t1-1, t1-30, t1-26, t1-22, t1-18, t1-47, t1-14, 
t1-43, t1-10, t1-39, t1-6, t1-35, t1-2, t1-31, t1-27, t1-23, t1-19, t1-48, 
t1-15, t1-44, t1-11, t1-40, t1-7, t1-36, t1-3, t1-32, t1-28, t1-24, t1-20, 
t1-49, t1-16, t1-45, t1-12, t1-41, t1-8, t1-37, t1-4, t1-33, t1-0) 
(kafka.server.ReplicaFetcherManager)
{code}

Complete log:
https://gist.github.com/showuon/665aa3ce6afd59097a2662f8260ecc10

Steps:
1. start up a 3.6 kafka cluster in KRaft with 1 broker
2. create a topic
3. upgrade the binary to 3.7
4. use kafka-features.sh to upgrade to 3.7 metadata version
5. check the log (and metrics if interested)

Analysis:
In 3.7, we have JBOD support in KRaft, so the partitionRegistration added a new 
directory field. And it causes diff found while comparing delta. We might be 
able to identify this adding directory change doesn't need to reset the 
leader/follower state, and just update the metadata, to avoid causing 
unavailability. 





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to