Adrian Preston created KAFKA-14074:
--------------------------------------
Summary: Restarting a broker during re-assignment can leave log
directory entries
Key: KAFKA-14074
URL: https://issues.apache.org/jira/browse/KAFKA-14074
Project: Kafka
Issue Type: Bug
Affects Versions: 3.1.0, 2.8.0
Reporter: Adrian Preston
Re-starting a broker while replicas are being assigned away from the broker can
result in topic partition directories being left in the broker’s log directory.
This can trigger further problems if such a topic is deleted and re-created.
These problems occur when replicas for the new topic are placed on a broker
that hosts a “stale” topic partition directory of the same name, causing the
on-disk topic partition state held by different brokers in the cluster to
diverge.
We have also been able to re-produce variants this problem using Kafka 2.8 and
3.1, as well as Kafka built from the head of the apache/kafka repository (at
the time of writing this is commit: 94d4fdeb28b3cd4d474d943448a7ef653eaa145d).
We have *not* being able to re-produce this problem with Kafka running in KRaft
mode.
A minimal re-create for topic directories being left on disk is as follows:
# Start ZooKeeper and a broker (both using the sample config)
# Create 100 topics: each with 1 partition, and with replication factor 1
# Add a second broker to the Kafka cluster (with minor edits to the sample
config for: {{{}broker.id{}}}, {{{}listeners{}}}, and {{{}log.dirs{}}})
# Issue a re-assignment that moves all of the topic partition replicas from
the first broker to the second broker
# While this re-assignment is taking place shutdown the first broker (you need
to be quick with only two brokers and 100 topics…)
# Wait a few seconds for the re-assignment to stall
# Restart the first broker and wait for the re-assignment to complete and it
to remove any partially deleted topics (e.g. those with a “-delete” suffix).
Inspecting the logs directory for the first broker should show directories
corresponding to topic partitions that are owned by the second broker. These
are not cleaned up when the re-assignment completes, and also remain in the
logs directory even if the first broker is restarted. Deleting the topic also
does not clean up the topic partitions left behind on the first broker - which
leads to a second potential problem.
For topics that have more than one replica: a new topic that has the same name
as a previously deleted topic might have replicas created on a broker with
“stale” topic partition directories. If this happens these topics will remain
in an under-replicated state.
A minimal re-create for this is as follows:
# Create a three node Kafka cluster (backed by ZK) based off the sample config
(to avoid confusion let’s call these kafka-0, kafka-1, and kafka-2)
# Create 100 topics: each with 1 partition, and with replication factor 2
# Submit a re-assignment to move all of the topic partition replicas to
kafka-0 and kafka-1, and wait for it to complete
# Submit a re-assignment to move all of the topic partition replicas on
kafka-0 to kafka-2.
# While this re-assignment is taking place shutdown and re-start kafka-0.
# Wait for the re-assignment to complete, and check that there’s unexpected
topic partition directories in kafka-0’s logs directory
# Delete all 100 topics, and re-create 100 new topics with the same name and
configuration as the deleted topics.
In this state kafka-1 and kafka-2 continually generate log messages similar to:
{{[2022-07-14 13:07:49,118] WARN [ReplicaFetcher replicaId=2, leaderId=0,
fetcherId=0] Received INCONSISTENT_TOPIC_ID from the leader for partition
test-039-0. This error may be returned transiently when the partition is being
created or deleted, but it is not expected to persist.
(kafka.server.ReplicaFetcherThread)}}
Topics that have had replicas created on kafka-0 are under-replicated with
kafka-0 missing from the ISR list. Performing a rolling restart of each broker
in turn does not resolve the problem, in fact more partitions are listed as
under-replicated, as before kafka-0 is missing from their ISR list.
I also tried to re-create this with Kafka running in Kraft mode, but was unable
to do so. My test configuration was three brokers configured based on
/config/kraft/server.properties. All three brokers were part of the controller
quorum. Interestingly I see log lines like the following when re-starting the
broker that I stopped mid-reassignment:
{{[2022-07-14 13:44:42,705] INFO Found stray log dir
Log(dir=/tmp/kraft-2/test-029-0, topicId=DMGA3zxyQqGUfeV6cmkcmg,
topic=test-029, partition=0, highWatermark=0, lastStableOffset=0,
logStartOffset=0, logEndOffset=0): the current replica assignment [I@530d4c70
does not contain the local brokerId 2.
(kafka.server.metadata.BrokerMetadataPublisher$)}}
With later log lines showing the topic being deleted. Looking at the
corresponding code: KRaft mode explicitly checks that the topic ID on disk
matches the expected value, and deletes the directory if it does not.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)