RivenSun created KAFKA-15185:
--------------------------------

             Summary: Consumers using the latest strategy may lose data after 
the topic adds partitions
                 Key: KAFKA-15185
                 URL: https://issues.apache.org/jira/browse/KAFKA-15185
             Project: Kafka
          Issue Type: Bug
          Components: consumer
    Affects Versions: 3.4.1
            Reporter: RivenSun
            Assignee: Luke Chen


h2. condition:

1. Business topic adds partition
2. The configuration metadata.max.age.ms of producers and consumers is set to 
five minutes.
But the producer discovered the new partition before the consumer, and 
generated 100 messages to the new partition.
3. The consumer parameter auto.offset.reset is set to latest
h2. result:

Consumers will lose these 100 messages


First of all we cannot directly set auto.offset.reset to {*}earliest{*}.
Because the user's demand is that a newly subscribed group can discard all old 
messages of the topic.
However, after the group is subscribed, the message generated by the expanded 
partition must be guaranteed not to be lost, similar to starting consumption 
from the earliest.
h2. 
suggestion:

So we have set the consumer's metadata.max.age.ms to 1/2 or 1/3 of the 
producer's configuration.
But this still can't solve the problem, because in many cases, the producer may 
force refresh the metadata.
Secondly, a smaller metadata.max.age.ms value will bring more metadata refresh 
requests, which will increase the burden on the broker.

So can we add a parameter to control how the consumer determines whether to 
start consumption from the earliest or latest for the newly added partition.
Perhaps during the rebalance process, the leaderConsumer needs to mark which 
partitions are newly added when calculating the assignment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to