[
https://issues.apache.org/jira/browse/KAFKA-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sophie Blee-Goldman resolved KAFKA-9987.
----------------------------------------
Fix Version/s: 2.5.1
2.4.2
2.6.0
Resolution: Fixed
> Improve sticky partition assignor algorithm
> -------------------------------------------
>
> Key: KAFKA-9987
> URL: https://issues.apache.org/jira/browse/KAFKA-9987
> Project: Kafka
> Issue Type: Improvement
> Components: clients
> Reporter: Sophie Blee-Goldman
> Assignee: Sophie Blee-Goldman
> Priority: Major
> Fix For: 2.6.0, 2.4.2, 2.5.1
>
>
> In
> [KIP-429|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol]
> we added the new CooperativeStickyAssignor which leverages on the underlying
> sticky assignment algorithm of the existing StickyAssignor (moved to
> AbstractStickyAssignor). The algorithm is fairly complex as it tries to
> optimize stickiness while satisfying perfect balance _in the case individual
> consumers may be subscribed to different subsets of the topics._ While it
> does a pretty good job at what it promises to do, it doesn't scale well with
> large numbers of consumers and partitions.
> To give a concrete example, users have reported that it takes 2.5 minutes for
> the assignment to complete with just 2100 consumers reading from 2100
> partitions. Since partitions revoked during the first of two cooperative
> rebalances will remain unassigned until the end of the second rebalance, it's
> important for the rebalance to be as fast as possible. And since one of the
> primary improvements of the cooperative rebalancing protocol is better
> scaling experience, the only OOTB cooperative assignor should not itself
> scale poorly
> If we can constrain the problem a bit, we can simplify the algorithm greatly.
> In many cases the individual consumers won't be subscribed to some random
> subset of the total subscription, they will all be subscribed to the same set
> of topics and rely on the assignor to balance the partition workload.
> We can detect this case by checking the group's individual subscriptions and
> call on a more efficient assignment algorithm.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)