Gangadharan created KAFKA-18974:
-----------------------------------
Summary: Uneven distribution of topic partitions across consumers
while using Cooperative Sticky Assignor
Key: KAFKA-18974
URL: https://issues.apache.org/jira/browse/KAFKA-18974
Project: Kafka
Issue Type: Bug
Components: clients, consumer
Affects Versions: 3.8.1
Reporter: Gangadharan
I came across a scenario where we see the spread of partitions with topic
across consumer threads is uneven. The topic with high TPS (for ex. 85%
traffic) had more partitions compared to the topics with low TPS (for ex. 15%
traffic). The consumer threads had subscribed to both set of topics.
Subsequently, some of the consumer threads were assigned with the more
partitions of low TPS topics. As a result, the pods with the consumer threads
that had more partitions of high TPS topics had to slog more resulting in
higher lag. However, if we choose round robin, the distribution is even between
threads and across pods. But we are limited by the stop the world condition.
There was already an issue raised and fixed on this context. However, it
doesn't fix the whole problem. I suspect that it is because, during the
rebalance the partitions that only the that are supposed to be moved from
existing consumers are sorted and distributed. However, there was no logic to
also check if the retained partitions should be moved to ensure even spread
across consumers.
[KAFKA-16277] CooperativeStickyAssignor does not spread topics evenly among
consumer group - ASF Jira
Below is a sample test:
2 pods with 6 consumer threads in each. Two topics with 18 partitions each
(test_topic_1 with higher inflow compared to test_topicone_1). As we could see,
the test_topic_1 is concentrated in pod1 as a result, it starts to create the
lag for the cooperative sticky strategy. However, for round robin, we see it is
distributed between pods.
Note: The sample test with same partition count was put for the sake of
understanding. Irrespective of the partition count of the topics, the behavior
seems to be same.
Cooperative Sticky:
Pod1
c--> consumer 1912486590767 [test_topic_1-1, test_topic_1-3,
{*}test_topicone_1{*}-1]
c--> consumer 1922696734819 [test_topic_1-11, test_topic_1-6,
{*}test_topicone_1{*}-6]
c--> consumer 1941340051228 [test_topic_1-12, test_topic_1-7,
{*}test_topicone_1{*}-7]
c--> consumer 1940955938996 [test_topic_1-0, test_topic_1-8,
{*}test_topicone_1{*}-0]
c--> consumer 1941837822481 [test_topic_1-2, test_topic_1-9,
{*}test_topicone_1{*}-2]
c--> consumer 1942719746188 [test_topic_1-10, test_topic_1-4,
{*}test_topicone_1{*}-4]
Pod2
c--> consumer 1941486742305 [test_topic_1-13, {*}test_topicone_1{*}-13,
{*}test_topicone_1{*}-5]
c--> consumer 1941837974018 [test_topic_1-14, {*}test_topicone_1{*}-14,
{*}test_topicone_1{*}-8]
c--> consumer 1942719897724 [test_topic_1-15, {*}test_topicone_1{*}-15,
{*}test_topicone_1{*}-9]
c--> consumer 1942696886353 [test_topic_1-16, {*}test_topicone_1{*}-10,
{*}test_topicone_1{*}-16]
c--> consumer 1941340202762 [test_topic_1-17, {*}test_topicone_1{*}-11,
{*}test_topicone_1{*}-17]
c--> consumer 1940956090534 [test_topic_1-5, {*}test_topicone_1{*}-12,
{*}test_topicone_1{*}-3]
-----------------------------------------------------------------------------------------
Round Robin:
Pod1
c--> consumer 1941408797822 [test_topic_1-0, test_topic_1-12,
{*}test_topicone_1{*}-6]
c--> consumer 1941456423553 [test_topic_1-9, {*}test_topicone_1{*}-15,
{*}test_topicone_1{*}-3]
c--> consumer 1942070859325 [test_topic_1-14, test_topic_1-2,
{*}test_topicone_1{*}-8]
c--> consumer 1941385036886 [test_topic_1-16, test_topic_1-4,
{*}test_topicone_1{*}-10]
c--> consumer 1941105638483 [test_topic_1-6, {*}test_topicone_1{*}-0,
{*}test_topicone_1{*}-12]
c--> consumer 1941885698382 [test_topic_1-10, {*}test_topicone_1{*}-16,
{*}test_topicone_1{*}-4]
Pod2
c--> consumer 1941456538287 [test_topic_1-8, {*}test_topicone_1{*}-14,
{*}test_topicone_1{*}-2]
c--> consumer 1942070974058 [test_topic_1-15, test_topic_1-3,
{*}test_topicone_1{*}-9]
c--> consumer 1941885813119 [test_topic_1-11, {*}test_topicone_1{*}-19,
{*}test_topicone_1{*}-5]
c--> consumer 1941408912555 [test_topic_1-1, test_topic_1-13,
{*}test_topicone_1{*}-7]
c--> consumer 1941385151618 [test_topic_1-17, test_topic_1-5,
{*}test_topicone_1{*}-11]
c--> consumer 1941105753216 [test_topic_1-7, {*}test_topicone_1{*}-1,
{*}test_topicone_1{*}-13]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)