Jun Rao created KAFKA-18625:
-------------------------------
Summary: consumer client could get duplicated records if assigned
partitions change quickly
Key: KAFKA-18625
URL: https://issues.apache.org/jira/browse/KAFKA-18625
Project: Kafka
Issue Type: Bug
Components: consumer
Reporter: Jun Rao
When a partition is unassigned to a consumer, we don't clear the buffered
records in the client immediately. When the client calls poll(),
[FetchCollector.fetchRecords()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java#L156-L158]
will call {{nextInLineFetch.drain()}} to drain the fetched data for that
partition if the partition is unassigned. In the common case, the buffered data
for unassigned partition will be drained before the partition is assigned back
again.
However, in the rare case, in theory, the following seems possible (1)
partition1 is assigned to client1; (2) a CompletedFetch for partition1 is
buffered in client1; (3) partition1 is reassigned to client2 and unassigned to
client1; (4) client2 consumes the same data buffered in step (2); (5)
partition1 is reassigned back to client2; (6) client1 calls poll() and consumes
the data buffered in step (2), causing duplicated data to be returned to the
client.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)