[ https://issues.apache.org/jira/browse/KAFKA-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108705#comment-17108705 ]
Travis Bischel edited comment on KAFKA-9987 at 5/31/20, 11:19 PM: ------------------------------------------------------------------ For context, here's my current benchmarks (WithExisting mirrors an existing cluster rejoining, Imbalanced means unequal subscriptions): {noformat} BenchmarkLarge BenchmarkLarge: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLarge: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLarge-12 100 11918236 ns/op 7121221 B/op 9563 allocs/op BenchmarkLargeWithExisting BenchmarkLargeWithExisting: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLargeWithExisting-12 74 16180851 ns/op 9605267 B/op 34015 allocs/op BenchmarkLargeImbalanced BenchmarkLargeImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeImbalanced-12 68 17798614 ns/op 17025139 B/op 9995 allocs/op BenchmarkLargeWithExistingImbalanced BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeWithExistingImbalanced-12 74 15852596 ns/op 9602434 B/op 33806 allocs/op {noformat} Switching up some numbers to better mirror this issue's problem statement: {noformat} BenchmarkLarge BenchmarkLarge: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLarge: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLarge: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLarge-12 3 447516434 ns/op 13942640 B/op 10619 allocs/op BenchmarkLargeWithExisting BenchmarkLargeWithExisting: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLargeWithExisting-12 3 460263266 ns/op 14482474 B/op 27700 allocs/op BenchmarkLargeImbalanced BenchmarkLargeImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeImbalanced-12 3 487361276 ns/op 50107610 B/op 10636 allocs/op BenchmarkLargeWithExistingImbalanced BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeWithExistingImbalanced-12 3 459259448 ns/op 14482096 B/op 27695 allocs/op {noformat} More extreme: {noformat} BenchmarkLarge BenchmarkLarge: sticky_test.go:1272: 1276057 total partitions; 1000 total members BenchmarkLarge-12 1 1889004419 ns/op 430359568 B/op 829830 allocs/op BenchmarkLargeWithExisting BenchmarkLargeWithExisting: sticky_test.go:1272: 1276057 total partitions; 1000 total members BenchmarkLargeWithExisting-12 1 3086791088 ns/op 617969240 B/op 2516550 allocs/op BenchmarkLargeImbalanced BenchmarkLargeImbalanced: sticky_test.go:1272: 1276057 total partitions; 1001 total members tBenchmarkLargeImbalanced-12 1 32948262382 ns/op 5543028064 B/op 830336 allocs/op BenchmarkLargeWithExistingImbalanced BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 1276057 total partitions; 1001 total members BenchmarkLargeWithExistingImbalanced-12 1 5206902130 ns/op 617954512 B/op 2515084 allocs/op {noformat} Note that the prior case uses quite a bit of RAM (~5-6G), but it also is balancing quite a lot of partitions among quite a lot of members; the actual planning itself only took ~0.5G, setup was the expensive part. 1 topic, 2100 partitions, 2100 members {noformat} BenchmarkLargeWithExisting-12 448 3424827 ns/op {noformat} was (Author: twmb): For context, here's my current benchmarks (WithExisting mirrors an existing cluster rejoining, Imbalanced means unequal subscriptions): {noformat} BenchmarkLarge BenchmarkLarge: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLarge: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLarge-12 100 11918236 ns/op 7121221 B/op 9563 allocs/op BenchmarkLargeWithExisting BenchmarkLargeWithExisting: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 24104 total partitions; 100 total members BenchmarkLargeWithExisting-12 74 16180851 ns/op 9605267 B/op 34015 allocs/op BenchmarkLargeImbalanced BenchmarkLargeImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeImbalanced-12 68 17798614 ns/op 17025139 B/op 9995 allocs/op BenchmarkLargeWithExistingImbalanced BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 24104 total partitions; 101 total members BenchmarkLargeWithExistingImbalanced-12 74 15852596 ns/op 9602434 B/op 33806 allocs/op {noformat} Switching up some numbers to better mirror this issue's problem statement: {noformat} BenchmarkLarge BenchmarkLarge: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLarge: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLarge: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLarge-12 3 447516434 ns/op 13942640 B/op 10619 allocs/op BenchmarkLargeWithExisting BenchmarkLargeWithExisting: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLargeWithExisting: sticky_test.go:1272: 4274 total partitions; 2100 total members BenchmarkLargeWithExisting-12 3 460263266 ns/op 14482474 B/op 27700 allocs/op BenchmarkLargeImbalanced BenchmarkLargeImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeImbalanced-12 3 487361276 ns/op 50107610 B/op 10636 allocs/op BenchmarkLargeWithExistingImbalanced BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 4274 total partitions; 2101 total members BenchmarkLargeWithExistingImbalanced-12 3 459259448 ns/op 14482096 B/op 27695 allocs/op {noformat} More extreme: {noformat} BenchmarkLarge BenchmarkLarge: sticky_test.go:1272: 1276057 total partitions; 1000 total members BenchmarkLarge-12 1 1889004419 ns/op 430359568 B/op 829830 allocs/op BenchmarkLargeWithExisting BenchmarkLargeWithExisting: sticky_test.go:1272: 1276057 total partitions; 1000 total members BenchmarkLargeWithExisting-12 1 3086791088 ns/op 617969240 B/op 2516550 allocs/op BenchmarkLargeImbalanced BenchmarkLargeImbalanced: sticky_test.go:1272: 1276057 total partitions; 1001 total members tBenchmarkLargeImbalanced-12 1 32948262382 ns/op 5543028064 B/op 830336 allocs/op BenchmarkLargeWithExistingImbalanced BenchmarkLargeWithExistingImbalanced: sticky_test.go:1272: 1276057 total partitions; 1001 total members BenchmarkLargeWithExistingImbalanced-12 1 5206902130 ns/op 617954512 B/op 2515084 allocs/op {noformat} Note that the prior case uses quite a bit of RAM (~5-6G), but it also is balancing quite a lot of partitions among quite a lot of members. 1 topic, 2100 partitions, 2100 members {noformat} BenchmarkLargeWithExisting-12 448 3424827 ns/op {noformat} > Improve sticky partition assignor algorithm > ------------------------------------------- > > Key: KAFKA-9987 > URL: https://issues.apache.org/jira/browse/KAFKA-9987 > Project: Kafka > Issue Type: Improvement > Components: clients > Reporter: Sophie Blee-Goldman > Assignee: Sophie Blee-Goldman > Priority: Major > > In > [KIP-429|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol] > we added the new CooperativeStickyAssignor which leverages on the underlying > sticky assignment algorithm of the existing StickyAssignor (moved to > AbstractStickyAssignor). The algorithm is fairly complex as it tries to > optimize stickiness while satisfying perfect balance _in the case individual > consumers may be subscribed to different subsets of the topics._ While it > does a pretty good job at what it promises to do, it doesn't scale well with > large numbers of consumers and partitions. > To give a concrete example, users have reported that it takes 2.5 minutes for > the assignment to complete with just 2100 consumers reading from 2100 > partitions. Since partitions revoked during the first of two cooperative > rebalances will remain unassigned until the end of the second rebalance, it's > important for the rebalance to be as fast as possible. And since one of the > primary improvements of the cooperative rebalancing protocol is better > scaling experience, the only OOTB cooperative assignor should not itself > scale poorly > If we can constrain the problem a bit, we can simplify the algorithm greatly. > In many cases the individual consumers won't be subscribed to some random > subset of the total subscription, they will all be subscribed to the same set > of topics and rely on the assignor to balance the partition workload. > We can detect this case by checking the group's individual subscriptions and > call on a more efficient assignment algorithm. -- This message was sent by Atlassian Jira (v8.3.4#803005)