[
https://issues.apache.org/jira/browse/KAFKA-20035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chia-Ping Tsai updated KAFKA-20035:
-----------------------------------
Description:
Currently, when a consumer group is configured with {{{}auto.offset.reset =
latest{}}}, dynamically adding new partitions to a subscribed topic can lead to
data loss due to a race condition.
The scenario is as follows:
# A group subscribes to a topic with {{{}auto.offset.reset = latest{}}}.
# The topic is expanded (e.g., from 3 to 4 partitions).
# Producers immediately start writing data to the new partition (Partition 3).
# The Group Coordinator detects the change and assigns Partition 3 to a member.
# The member initializes the partition. Since there is no committed offset, it
applies the
# *Result: Any messages written to Partition 3 between step 3 and step 5 are
skipped and lost.*
>From a user's perspective, {{latest}} should mean "start consuming from the
>point of subscription," not "skip data from newly created infrastructure."
was:
Currently, when a consumer group is configured with {{{}auto.offset.reset =
latest{}}}, dynamically adding new partitions to a subscribed topic can lead to
data loss due to a race condition.
The scenario is as follows:
# A group subscribes to a topic with {{{}auto.offset.reset = latest{}}}.
# The topic is expanded (e.g., from 3 to 4 partitions).
# Producers immediately start writing data to the new partition (Partition 3).
# The Group Coordinator detects the change and assigns Partition 3 to a member.
# The member initializes the partition. Since there is no committed offset, it
applies the {{latest}} policy.
# *Result:* Any messages written to Partition 3 between step 3 and step 5 are
skipped and lost.
>From a user's perspective, {{latest}} should mean "start consuming from the
>point of subscription," not "skip data from newly created infrastructure."
> Prevent data loss during partition expansion by enforcing "earliest" offset
> reset for dynamically added partitions
> ------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20035
> URL: https://issues.apache.org/jira/browse/KAFKA-20035
> Project: Kafka
> Issue Type: Bug
> Reporter: Chia-Ping Tsai
> Assignee: Chia-Ping Tsai
> Priority: Critical
>
> Currently, when a consumer group is configured with {{{}auto.offset.reset =
> latest{}}}, dynamically adding new partitions to a subscribed topic can lead
> to data loss due to a race condition.
> The scenario is as follows:
> # A group subscribes to a topic with {{{}auto.offset.reset = latest{}}}.
> # The topic is expanded (e.g., from 3 to 4 partitions).
> # Producers immediately start writing data to the new partition (Partition
> 3).
> # The Group Coordinator detects the change and assigns Partition 3 to a
> member.
> # The member initializes the partition. Since there is no committed offset,
> it applies the
> # *Result: Any messages written to Partition 3 between step 3 and step 5 are
> skipped and lost.*
> From a user's perspective, {{latest}} should mean "start consuming from the
> point of subscription," not "skip data from newly created infrastructure."
--
This message was sent by Atlassian Jira
(v8.20.10#820010)