[
https://issues.apache.org/jira/browse/KAFKA-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540696#comment-16540696
]
Dong Lin commented on KAFKA-7126:
---------------------------------
[[email protected]] Thanks for the comment. Giving the understanding of the
root cause, I am not sure if we can solve the problem by adding jitter.
> Reduce number of rebalance period for large consumer groups after a topic is
> created
> ------------------------------------------------------------------------------------
>
> Key: KAFKA-7126
> URL: https://issues.apache.org/jira/browse/KAFKA-7126
> Project: Kafka
> Issue Type: Improvement
> Reporter: Dong Lin
> Assignee: Dong Lin
> Priority: Major
>
> For a group of 200 MirrorMaker consumers with patten-based topic
> subscription, a single topic creation caused 50 rebalances for each of these
> consumer over 5 minutes period. This causes the MM to significantly lag
> behind during this 5 minutes period and the clusters may be considerably
> out-of-sync during this period.
> Ideally we would like to trigger only 1 rebalance in the MM group after a
> topic is created. And conceptually it should be doable.
>
> Here is the explanation of this repeated consumer rebalance based on the
> consumer rebalance logic in the latest Kafka code:
> 1) A topic of 10 partitions are created in the cluster and it matches the
> subscription pattern of the MM consumers.
> 2) The leader of the MM consumer group detects the new topic after metadata
> refresh. It triggers rebalance.
> 3) At time T0, the first rebalance finishes. 10 consumers are assigned 1
> partition of this topic. The other 190 consumers are not assigned any
> partition of this topic. At this moment, the newly created topic will appear
> in `ConsumerCoordinator.subscriptions.subscription` for those consumers who
> is assigned partition of this consumer or who has refreshed metadata before
> time T0.
> 4) In the common case, half of the consumers has refreshed metadata before
> the leader of the consumer group refreshed metadata. Thus around 100 + 10 =
> 110 consumers has the newly created topic in
> `ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do
> not have this topic in `ConsumerCoordinator.subscriptions.subscription`.
> 5) For those 90 consumers, if any consumer refreshes metadata, it will add
> this topic to `ConsumerCoordinator.subscriptions.subscription`, which causes
> `ConsumerCoordinator.rejoinNeededOrPending()` to return true and triggers
> another rebalance. If a few consumers refresh metadata almost at the same
> time, they will jointly trigger one rebalance. Otherwise, they each trigger a
> separate rebalance.
> 6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case,
> which is probably also the average case if number of consumers in the group
> is large, the latest consumer will refresh its metadata 5 minutes after T0.
> And the rebalance will be repeated during this 5 minutes interval.
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)