Igor Soarez created KAFKA-15650:
-----------------------------------
Summary: Data-loss on leader shutdown right after partition
creation?
Key: KAFKA-15650
URL: https://issues.apache.org/jira/browse/KAFKA-15650
Project: Kafka
Issue Type: Sub-task
Reporter: Igor Soarez
As per KIP-858, when a replica is created, the broker selects a log directory
to host the replica and queues the propagation of the directory assignment to
the controller. The replica becomes immediately active, it isn't blocked until
the controller confirms the metadata change. If the replica is the leader
replica it can immediately start accepting writes.
Consider the following scenario:
# A partition is created in some selected log directory, and some produce
traffic is accepted
# Before the broker is able to notify the controller of the directory
assignment, the broker shuts down
# Upon coming back online, the broker has an offline directory, the same
directory which was chosen to host the replica
# The broker assumes leadership for the replica, but cannot find it in any
available directory and has no way of knowing it was already created because
the directory assignment is still missing
# The replica is created and the previously produced records are lost
Step 4. may seem unlikely due to ISR membership gating leadership, but even
assuming acks=all and replicas>1, if all other replicas are also offline the
broker may still gain leadership. Perhaps KIP-966 is relevant here.
We may need to delay new replica activation until the assignment is propagated
successfully.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)