[
https://issues.apache.org/jira/browse/KAFKA-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17309276#comment-17309276
]
hudeqi commented on KAFKA-12478:
--------------------------------
Ok, If I submit a patch, do you prefer to modify the logic on the client side
or on the server side?
> Consumer group may lose data for newly expanded partitions when add
> partitions for topic if the group is set to consume from the latest
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-12478
> URL: https://issues.apache.org/jira/browse/KAFKA-12478
> Project: Kafka
> Issue Type: Improvement
> Components: clients
> Affects Versions: 2.7.0
> Reporter: hudeqi
> Priority: Blocker
> Labels: patch
> Original Estimate: 1,158h
> Remaining Estimate: 1,158h
>
> This problem is exposed in our product environment: a topic is used to
> produce monitoring data. *After expanding partitions, the consumer side of
> the business reported that the data is lost.*
> After preliminary investigation, the lost data is all concentrated in the
> newly expanded partitions. The reason is: when the server expands, the
> producer firstly perceives the expansion, and some data is written in the
> newly expanded partitions. But the consumer group perceives the expansion
> later, after the rebalance is completed, the newly expanded partitions will
> be consumed from the latest if it is set to consume from the latest. Within a
> period of time, the data of the newly expanded partitions is skipped and lost
> by the consumer.
> If it is not necessarily set to consume from the earliest for a huge data
> flow topic when starts up, this will make the group consume historical data
> from the broker crazily, which will affect the performance of brokers to a
> certain extent. Therefore, *it is necessary to consume these partitions from
> the earliest separately.*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)