[jira] [Updated] (KAFKA-7126) Reduce number of rebalance period for large consumer groups after a topic is created

2018-07-26 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-7126:

Fix Version/s: 2.1.0
   2.0.0

> Reduce number of rebalance period for large consumer groups after a topic is 
> created
> 
>
> Key: KAFKA-7126
> URL: https://issues.apache.org/jira/browse/KAFKA-7126
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Dong Lin
>Assignee: Jon Lee
>Priority: Major
> Fix For: 2.0.0, 2.1.0
>
> Attachments: 1.diff
>
>
> For a group of 200 MirrorMaker consumers with patten-based topic 
> subscription, a single topic creation caused 50 rebalances for each of these 
> consumer over 5 minutes period. This causes the MM to significantly lag 
> behind during this 5 minutes period and the clusters may be considerably 
> out-of-sync during this period.
> Ideally we would like to trigger only 1 rebalance in the MM group after a 
> topic is created. And conceptually it should be doable.
>  
> Here is the explanation of this repeated consumer rebalance based on the 
> consumer rebalance logic in the latest Kafka code:
> 1) A topic of 10 partitions are created in the cluster and it matches the 
> subscription pattern of the MM consumers.
> 2) The leader of the MM consumer group detects the new topic after metadata 
> refresh. It triggers rebalance.
> 3) At time T0, the first rebalance finishes. 10 consumers are assigned 1 
> partition of this topic. The other 190 consumers are not assigned any 
> partition of this topic. At this moment, the newly created topic will appear 
> in `ConsumerCoordinator.subscriptions.subscription` for those consumers who 
> is assigned partition of this consumer or who has refreshed metadata before 
> time T0.
> 4) In the common case, half of the consumers has refreshed metadata before 
> the leader of the consumer group refreshed metadata. Thus around 100 + 10 = 
> 110 consumers has the newly created topic in 
> `ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do 
> not have this topic in `ConsumerCoordinator.subscriptions.subscription`.
> 5) For those 90 consumers, if any consumer refreshes metadata, it will add 
> this topic to `ConsumerCoordinator.subscriptions.subscription`, which causes 
> `ConsumerCoordinator.rejoinNeededOrPending()` to return true and triggers 
> another rebalance. If a few consumers refresh metadata almost at the same 
> time, they will jointly trigger one rebalance. Otherwise, they each trigger a 
> separate rebalance.
> 6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case, 
> which is probably also the average case if number of consumers in the group 
> is large, the latest consumer will refresh its metadata 5 minutes after T0. 
> And the rebalance will be repeated during this 5 minutes interval.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7126) Reduce number of rebalance period for large consumer groups after a topic is created

2018-07-01 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-7126:

Description: 
For a group of 200 MirrorMaker consumers with patten-based topic subscription, 
a single topic creation caused 50 rebalances for each of these consumer over 5 
minutes period. This causes the MM to significantly lag behind during this 5 
minutes period and the clusters may be considerably out-of-sync during this 
period.

Ideally we would like to trigger only 1 rebalance in the MM group after a topic 
is created. And conceptually it should be doable.

 

Here is the explanation of this repeated consumer rebalance based on the 
consumer rebalance logic in the latest Kafka code:

1) A topic of 10 partitions are created in the cluster and it matches the 
subscription pattern of the MM consumers.

2) The leader of the MM consumer group detects the new topic after metadata 
refresh. It triggers rebalance.

3) At time T0, the first rebalance finishes. 10 consumers are assigned 1 
partition of this topic. The other 190 consumers are not assigned any partition 
of this topic. At this moment, the newly created topic will appear in 
`ConsumerCoordinator.subscriptions.subscription` for those consumers who is 
assigned partition of this consumer or who has refreshed metadata before time 
T0.

4) In the common case, half of the consumers has refreshed metadata before the 
leader of the consumer group refreshed metadata. Thus around 100 + 10 = 110 
consumers has the newly created topic in 
`ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do not 
have this topic in `ConsumerCoordinator.subscriptions.subscription`.

5) For those 90 consumers, if any consumer refreshes metadata, it will add this 
topic to `ConsumerCoordinator.subscriptions.subscription`, which causes 
`ConsumerCoordinator.rejoinNeededOrPending()` to return true and triggers 
another rebalance. If a few consumers refresh metadata almost at the same time, 
they will jointly trigger one rebalance. Otherwise, they each trigger a 
separate rebalance.

6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case, which 
is probably also the average case if number of consumers in the group is large, 
the latest consumer will refresh its metadata 5 minutes after T0. And the 
rebalance will be repeated during this 5 minutes interval.

 

 

 

  was:
For a group of 200 MirrorMaker consumers with patten-based topic subscription, 
a single topic creation caused 50 rebalances for each of these consumer over 5 
minutes period. This causes the MM to significantly lag behind during this 5 
minutes period and the clusters may be considerably out-of-sync during this 
period.

Ideally we would like to trigger only 1 rebalance in the MM group after a topic 
is created. And conceptually it should be doable.

 

Here is the explanation of this repeated consumer rebalance based on the 
consumer rebalance logic in the latest Kafka code:

1) A topic of 10 partitions are created in the cluster and it matches the 
subscription pattern of the MM consumers.

2) The leader of the MM consumer group detects the new topic after metadata 
refresh. It triggers rebalance.

3) At time T0, the first rebalance finishes. 10 consumers are assigned 1 
partition of this topic. The other 190 consumers are not assigned any partition 
of this topic. At this moment, the newly created topic will appear in 
`ConsumerCoordinator.subscriptions.subscription` for those consumers who is 
assigned partition of this consumer or who has refreshed metadata before time 
T0.

4) In the common case, half of the consumers has refreshed metadata before the 
leader of the consumer group refreshed metadata. Thus around 100 + 10 = 110 
consumers has the newly created topic in 
`ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do not 
have this topic in `ConsumerCoordinator.subscriptions.subscription`.

5) For those 90 consumers, if any consumer refreshes metadata, it will add this 
topic to `ConsumerCoordinator.subscriptions.subscription`, which causes 
`ConsumerCoordinator.needRejoin()` to return true and triggers another 
rebalance. If a few consumers refresh metadata almost at the same time, they 
will jointly trigger one rebalance. Otherwise, they each trigger a separate 
rebalance.

6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case, which 
is probably also the average case if number of consumers in the group is large, 
the latest consumer will refresh its metadata 5 minutes after T0. And the 
rebalance will be repeated during this 5 minutes interval.

 

 

 


> Reduce number of rebalance period for large consumer groups after a topic is 
> created
> 
>
> Key: KAFKA-7126
> URL: 

[jira] [Updated] (KAFKA-7126) Reduce number of rebalance period for large consumer groups after a topic is created

2018-07-01 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-7126:

Description: 
For a group of 200 MirrorMaker consumers with patten-based topic subscription, 
a single topic creation caused 50 rebalances for each of these consumer over 5 
minutes period. This causes the MM to significantly lag behind during this 5 
minutes period and the clusters may be considerably out-of-sync during this 
period.

Ideally we would like to trigger only 1 rebalance in the MM group after a topic 
is created. And conceptually it should be doable.

 

Here is the explanation of this repeated consumer rebalance based on the 
consumer rebalance logic in the latest Kafka code:

1) A topic of 10 partitions are created in the cluster and it matches the 
subscription pattern of the MM consumers.

2) The leader of the MM consumer group detects the new topic after metadata 
refresh. It triggers rebalance.

3) At time T0, the first rebalance finishes. 10 consumers are assigned 1 
partition of this topic. The other 190 consumers are not assigned any partition 
of this topic. At this moment, the newly created topic will appear in 
`ConsumerCoordinator.subscriptions.subscription` for those consumers who is 
assigned partition of this consumer or who has refreshed metadata before time 
T0.

4) In the common case, half of the consumers has refreshed metadata before the 
leader of the consumer group refreshed metadata. Thus around 100 + 10 = 110 
consumers has the newly created topic in 
`ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do not 
have this topic in `ConsumerCoordinator.subscriptions.subscription`.

5) For those 90 consumers, if any consumer refreshes metadata, it will add this 
topic to `ConsumerCoordinator.subscriptions.subscription`, which causes 
`ConsumerCoordinator.needRejoin()` to return true and triggers another 
rebalance. If a few consumers refresh metadata almost at the same time, they 
will jointly trigger one rebalance. Otherwise, they each trigger a separate 
rebalance.

6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case, which 
is probably also the average case if number of consumers in the group is large, 
the latest consumer will refresh its metadata 5 minutes after T0. And the 
rebalance will be repeated during this 5 minutes interval.

 

 

 

  was:
For a group of 200 MirrorMaker consumers with patten-based topic subscription, 
a single topic creation caused 50 rebalances for each of these consumer over 5 
minutes period. This causes the MM to significantly lag behind during this 5 
minutes period and the clusters may be considerably out-of-sync during this 
period.

Ideally we would like to trigger only 1 rebalance in the MM group after a topic 
is created. And conceptually it should be doable.

 

 

 

Here is the explanation of this repeated consumer rebalance based on the 
consumer rebalance logic in the latest Kafka code:


1) A topic of 10 partitions are created in the cluster and it matches the 
subscription pattern of the MM consumers.

2) The leader of the MM consumer group detects the new topic after metadata 
refresh. It triggers rebalance.

3) At time T0, the first rebalance finishes. 10 consumers are assigned 1 
partition of this topic. The other 190 consumers are not assigned any partition 
of this topic. At this moment, the newly created topic will appear in 
`ConsumerCoordinator.subscriptions.subscription` for those consumers who is 
assigned partition of this consumer or who has refreshed metadata before time 
T0.

4) In the common case, half of the consumers has refreshed metadata before the 
leader of the consumer group refreshed metadata. Thus around 100 + 10 = 110 
consumers has the newly created topic in 
`ConsumerCoordinator.subscriptions.subscription`. The other 90 consumers do not 
have this topic in `ConsumerCoordinator.subscriptions.subscription`.

5) For those 90 consumers, if any consumer refreshes metadata, it will add this 
topic to `ConsumerCoordinator.subscriptions.subscription`, which causes 
`ConsumerCoordinator.needRejoin()` to return true and triggers another 
rebalance. If a few consumers refresh metadata almost at the same time, they 
will jointly trigger one rebalance. Otherwise, they each trigger a separate 
rebalance.

6) The default metadata.max.age.ms is 5 minutes. Thus in the worse case, which 
is probably also the average case if number of consumers in the group is large, 
the latest consumer will refresh its metadata 5 minutes after T0. And the 
rebalance will be repeated during this 5 minutes interval.

 

 

 


> Reduce number of rebalance period for large consumer groups after a topic is 
> created
> 
>
> Key: KAFKA-7126
> URL: