[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325234#comment-17325234 ] Guozhang Wang commented on KAFKA-12675: --- Wow this is super! Thanks [~showuon] [~twmb], please ping me as well as [~ableegoldman] when you think https://github.com/apache/kafka/pull/10552 is ready to review. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325181#comment-17325181 ] A. Sophie Blee-Goldman commented on KAFKA-12675: Nice! It would be great if we could get these improvements in to 3.0, since as Luke mentioned we plan to make this the default assignor. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324378#comment-17324378 ] Luke Chen commented on KAFKA-12675: --- Awesome! I'll also check your code to see how we can improve in KAFKA-12676. Thank you. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324358#comment-17324358 ] Travis Bischel commented on KAFKA-12675: Great insight on getting rid of partition2AllPotentialConsumers, as well as keeping some more things sorted! I was able to translate that into my own code and dropped the large imbalance from 9.5s to 0.5s, as well as from 8.5G memory util to 0.5G :) I'll take a look at the code more in depth soon. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324204#comment-17324204 ] Luke Chen commented on KAFKA-12675: --- PR: [https://github.com/apache/kafka/pull/10552] [~twmb], I didn't find you in kafka github, welcome to review. Thank you. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324153#comment-17324153 ] Travis Bischel commented on KAFKA-12675: Yep, that's as I understood it :). For me, running the _ testLargeAssignmentAndGroupWithUniformSubscription_ with a single extra consumer that consumes from one topic (causing an imbalance) results in my balancing algorithm to average 9.5s per balance. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324150#comment-17324150 ] Luke Chen commented on KAFKA-12675: --- [~twmb], I didn't phrase it clearly. The 5 seconds is to run sticky assignor 1 time, while the _testLargeAssignmentAndGroupWithUniformSubscription_ test actually did the sticky assignor twice, which will be around 10 seconds. Anyway, I'd be happy if you can review my PR to give some advice. Thank you. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324139#comment-17324139 ] Travis Bischel commented on KAFKA-12675: Interesting, I'm looking forward to seeing the changes, since 5s with the large imbalance beats my 9.5s! > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324137#comment-17324137 ] Luke Chen commented on KAFKA-12675: --- [~ableegoldman], agree! In this ticket, what I will do is to improve the scalability and performance *via code refactor, keep the same algorithm.* In KAFKA-12676 , we'll do the underlying algorithm improvement to see if the performance can be improved more. So far, I've refactored the codes and do some method re-write, it has reached: 1. Originally, With this setting: topicCount = {color:#ff}50{color}; partitionCount = 8{color:#ff}00{color}; consumerCount = 8{color:#ff}00{color}; We complete in 10 seconds, after my code refactor, the time *down to 200 ms* 2. With the 1 million partitions setting: topicCount = {color:#ff}500{color}; partitionCount = {color:#ff}2000{color}; consumerCount = {color:#ff}2000{color}; No OutOfMemory will be thrown anymore. The time will take 5 seconds. I think the improvement is pretty good. I'll wrap up the codes and send PR later. And next, we can implement KAFKA-12676 , to see if the performance will be better. Thank you. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324090#comment-17324090 ] A. Sophie Blee-Goldman commented on KAFKA-12675: [~twmb] would you be interested in submitting a PR for your algorithm? You'd need to translate it (and tests) into Java, but other than that it seems pretty much complete. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324089#comment-17324089 ] A. Sophie Blee-Goldman commented on KAFKA-12675: Yeah just to clarify, what [~twmb] proposed would not require a KIP since it's just an improvement to the existing (somewhat lacking) algorithm for the general case. There shouldn't be any public facing impact, except perhaps for the memory consumption. But since the current algorithm can't even handle the partition counts he described testing, I'd still consider this an improvement across the board. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322608#comment-17322608 ] Luke Chen commented on KAFKA-12675: --- [~twmb], I see. I've created another ticket KAFKA-12676 to address your suggestion. We can make incremental improvement for this. Thanks for suggestion! > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322603#comment-17322603 ] Travis Bischel commented on KAFKA-12675: What I mean to say is that the logic powering the existing cooperative-sticky algorithm is heuristic and not truly balanced, and that the logic itself can be changed to be more exact to the cooperative-sticky goals while being much more efficient. That is, changes can be made for the imbalanced case similar to how [~ableegoldman] made changes to the balanced case, and these changes will more exactly fulfill the goal of cooperative sticky while being more efficient. This does not change how things are balanced / it does not change the actual sticky aspect. Basically, improving the underlying algorithm for the imbalanced case directly fulfills the goals of this ticket to improve the scalability and performance. I'll edit this comment shortly with some benchmarking numbers. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322599#comment-17322599 ] Luke Chen commented on KAFKA-12675: --- [~twmb], thanks for suggestion. But in this ticket, I'd like to improve the scalability and performance first. I agree the whole algorithm can be improved, too, but that needs to go through KIP and have more discussion to be able to go on. Thanks anyway, I'll think about it. :) > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-12675) Improve sticky general assignor scalability and performance
[ https://issues.apache.org/jira/browse/KAFKA-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322593#comment-17322593 ] Travis Bischel commented on KAFKA-12675: An option to evaluate is the algorithm I devised in my franz-go client, which translates the balancing into a graph and uses A* search to perform an exact balance much more efficiently. I noticed that the existing Java algorithm is heuristic based, and I have a few tests in my repo showing edge cases that the existing heuristic algorithm cannot really handle. The algorithm is here: https://github.com/twmb/franz-go/blob/master/pkg/kgo/internal/sticky/graph.go with the option to switch into that algorithm in the sticky.go file. > Improve sticky general assignor scalability and performance > --- > > Key: KAFKA-12675 > URL: https://issues.apache.org/jira/browse/KAFKA-12675 > Project: Kafka > Issue Type: Improvement >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > > Currently, we have "general assignor" for non-equal subscription case and > "constrained assignor" for all equal subscription case. There's a performance > test for constrained assignor with: > topicCount = {color:#ff}500{color}; > partitionCount = {color:#ff}2000{color}; > consumerCount = {color:#ff}2000{color}; > in _testLargeAssignmentAndGroupWithUniformSubscription,_ total 1 million > partitions and we can complete the assignment within 2 second in my machine. > However, if we let 1 of the consumer subscribe to only 1 topic, it'll use > "general assignor", and the result with the same setting as above is: > *OutOfMemory,* > Even we down the count to: > topicCount = {color:#ff}50{color}; > partitionCount = 1{color:#ff}000{color}; > consumerCount = 1{color:#ff}000{color}; > We still got *OutOfMemory*. > With this setting: > topicCount = {color:#ff}50{color}; > partitionCount = 8{color:#ff}00{color}; > consumerCount = 8{color:#ff}00{color}; > We can complete in 10 seconds in my machine, which is still slow. > > Since we are going to set default assignment strategy to > "CooperativeStickyAssignor" soon, we should improve the scalability and > performance for sticky general assignor. -- This message was sent by Atlassian Jira (v8.3.4#803005)