Adrian Preston created KAFKA-14074: -------------------------------------- Summary: Restarting a broker during re-assignment can leave log directory entries Key: KAFKA-14074 URL: https://issues.apache.org/jira/browse/KAFKA-14074 Project: Kafka Issue Type: Bug Affects Versions: 3.1.0, 2.8.0 Reporter: Adrian Preston
Re-starting a broker while replicas are being assigned away from the broker can result in topic partition directories being left in the broker’s log directory. This can trigger further problems if such a topic is deleted and re-created. These problems occur when replicas for the new topic are placed on a broker that hosts a “stale” topic partition directory of the same name, causing the on-disk topic partition state held by different brokers in the cluster to diverge. We have also been able to re-produce variants this problem using Kafka 2.8 and 3.1, as well as Kafka built from the head of the apache/kafka repository (at the time of writing this is commit: 94d4fdeb28b3cd4d474d943448a7ef653eaa145d). We have *not* being able to re-produce this problem with Kafka running in KRaft mode. A minimal re-create for topic directories being left on disk is as follows: # Start ZooKeeper and a broker (both using the sample config) # Create 100 topics: each with 1 partition, and with replication factor 1 # Add a second broker to the Kafka cluster (with minor edits to the sample config for: {{{}broker.id{}}}, {{{}listeners{}}}, and {{{}log.dirs{}}}) # Issue a re-assignment that moves all of the topic partition replicas from the first broker to the second broker # While this re-assignment is taking place shutdown the first broker (you need to be quick with only two brokers and 100 topics…) # Wait a few seconds for the re-assignment to stall # Restart the first broker and wait for the re-assignment to complete and it to remove any partially deleted topics (e.g. those with a “-delete” suffix). Inspecting the logs directory for the first broker should show directories corresponding to topic partitions that are owned by the second broker. These are not cleaned up when the re-assignment completes, and also remain in the logs directory even if the first broker is restarted. Deleting the topic also does not clean up the topic partitions left behind on the first broker - which leads to a second potential problem. For topics that have more than one replica: a new topic that has the same name as a previously deleted topic might have replicas created on a broker with “stale” topic partition directories. If this happens these topics will remain in an under-replicated state. A minimal re-create for this is as follows: # Create a three node Kafka cluster (backed by ZK) based off the sample config (to avoid confusion let’s call these kafka-0, kafka-1, and kafka-2) # Create 100 topics: each with 1 partition, and with replication factor 2 # Submit a re-assignment to move all of the topic partition replicas to kafka-0 and kafka-1, and wait for it to complete # Submit a re-assignment to move all of the topic partition replicas on kafka-0 to kafka-2. # While this re-assignment is taking place shutdown and re-start kafka-0. # Wait for the re-assignment to complete, and check that there’s unexpected topic partition directories in kafka-0’s logs directory # Delete all 100 topics, and re-create 100 new topics with the same name and configuration as the deleted topics. In this state kafka-1 and kafka-2 continually generate log messages similar to: {{[2022-07-14 13:07:49,118] WARN [ReplicaFetcher replicaId=2, leaderId=0, fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition test-039-0. This error may be returned transiently when the partition is being created or deleted, but it is not expected to persist. (kafka.server.ReplicaFetcherThread)}} Topics that have had replicas created on kafka-0 are under-replicated with kafka-0 missing from the ISR list. Performing a rolling restart of each broker in turn does not resolve the problem, in fact more partitions are listed as under-replicated, as before kafka-0 is missing from their ISR list. I also tried to re-create this with Kafka running in Kraft mode, but was unable to do so. My test configuration was three brokers configured based on /config/kraft/server.properties. All three brokers were part of the controller quorum. Interestingly I see log lines like the following when re-starting the broker that I stopped mid-reassignment: {{[2022-07-14 13:44:42,705] INFO Found stray log dir Log(dir=/tmp/kraft-2/test-029-0, topicId=DMGA3zxyQqGUfeV6cmkcmg, topic=test-029, partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=0): the current replica assignment [I@530d4c70 does not contain the local brokerId 2. (kafka.server.metadata.BrokerMetadataPublisher$)}} With later log lines showing the topic being deleted. Looking at the corresponding code: KRaft mode explicitly checks that the topic ID on disk matches the expected value, and deletes the directory if it does not. -- This message was sent by Atlassian Jira (v8.20.10#820010)