[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-05-14 Thread A. Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846467#comment-17846467
 ] 

A. Sophie Blee-Goldman edited comment on KAFKA-16361 at 5/15/24 12:24 AM:
--

Thanks, I think it's safe to say this is related to the rack-aware assignment 
code that was added in 3.5. Probably the same issue that [~flashmouse] found in 
KAFKA-15170

Fortunately I just merged that fix and cherrypicked it back to 3.7, so the 
patch should be included in both the upcoming 3.8 release and the 3.7.1 bugfix 
release, whenever that happens. Not sure of the timing for 3.7.1 but 3.8 is 
just a day from KIP freeze which means if all goes well, it will be available 
in a little over a month.

If you need an immediate resolution in the meantime then you have two options:

1) disable rack-awareness which will effectively make the assignor just skip 
over the buggy code

2) if you can build from source and don't require an official release, just 
cherrypick [this fix|https://github.com/apache/kafka/pull/13965] to a branch 
with whatever version you'd like to use and compile it yourself. I wouldn't 
recommend building directly from trunk for a production environment since that 
contains untested code, but you can at least run your test again using the 
latest trunk build if you want to make sure that it fixes the issue you're 
experiencing. I'm pretty confident it will though


was (Author: ableegoldman):
Thanks, I think it's safe to say this is related to the rack-aware assignment 
code that was added in 3.5. Probably the same issue that [~flashmouse] found in 
[KAFKA-15170|https://issues.apache.org/jira/browse/KAFKA-15170]

 

Fortunately I just merged that fix and cherrypicked it back to 3.7, so the 
patch should be included in both the upcoming 3.8 release and the 3.7.1 bugfix 
release, whenever that happens. Not sure of the timing for 3.7.1 but 3.8 is 
just a day from KIP freeze which means if all goes well, it will be available 
in a little over a month.

 

If you need an immediate resolution in the meantime then you have two options:

1) disable rack-awareness which will effectively make the assignor just skip 
over the buggy code

2) if you can build from source and don't require an official release, just 
cherrypick [this fix|https://github.com/apache/kafka/pull/13965] to a branch 
with whatever version you'd like to use and compile it yourself. I wouldn't 
recommend building directly from trunk for a production environment since that 
contains untested code, but you can at least run your test again using the 
latest trunk build if you want to make sure that it fixes the issue you're 
experiencing. I'm pretty confident it will though

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-05-13 Thread li xiangyuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846139#comment-17846139
 ] 

li xiangyuan edited comment on KAFKA-16361 at 5/14/24 2:39 AM:
---

and I also created a Jira issue


was (Author: flashmouse):
and I also create a Jira 
[issue|https://issues.apache.org/jira/browse/KAFKA-15170]

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 33, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 66, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-05-13 Thread li xiangyuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846139#comment-17846139
 ] 

li xiangyuan edited comment on KAFKA-16361 at 5/14/24 2:39 AM:
---

and I also created a Jira 
[issue|https://issues.apache.org/jira/browse/KAFKA-15170]


was (Author: flashmouse):
and I also created a Jira issue

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 33, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 66, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-05-13 Thread li xiangyuan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846139#comment-17846139
 ] 

li xiangyuan edited comment on KAFKA-16361 at 5/14/24 2:39 AM:
---

and I also create a Jira 
[issue|https://issues.apache.org/jira/browse/KAFKA-15170]


was (Author: flashmouse):
and I also create a[ Jira 
issue|https://issues.apache.org/jira/browse/KAFKA-15170]

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 33, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 66, leader = 4, replicas = [4], isr 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-03-14 Thread BDeus (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827134#comment-17827134
 ] 

BDeus edited comment on KAFKA-16361 at 3/14/24 3:47 PM:


Is it a regression related to this feature 
https://issues.apache.org/jira/browse/KAFKA-14450 ?


was (Author: baz33):
Is it a regression related to this feature 
[https://issues.apache.org/jira/browse/KAFKA-14450|http://example.com] ?

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 33, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 66, 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-03-14 Thread BDeus (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827134#comment-17827134
 ] 

BDeus edited comment on KAFKA-16361 at 3/14/24 3:46 PM:


Is it a regression related to this feature 
[https://issues.apache.org/jira/browse/KAFKA-14450|http://example.com] ?


was (Author: baz33):
Is it a regression related to this ticket 
https://issues.apache.org/jira/browse/KAFKA-14867 ?

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 33, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 66, leader 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-03-13 Thread Laymain (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826042#comment-17826042
 ] 

Laymain edited comment on KAFKA-16361 at 3/13/24 12:56 PM:
---

Hi there, we have the exact same problem here, I was about to open an issue.

If it can help, here are some log involving only two hosts (i-0da0437e61e61bf88 
and i-0d2e25eb1aebefab5): [^illegalstateexception.log]


was (Author: laymain):
Hi there, we have the exact same problem here, I was about to open an issue.

If it can help, here some log involving only two hosts (i-0da0437e61e61bf88 and 
i-0d2e25eb1aebefab5): [^illegalstateexception.log]

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, 

[jira] [Comment Edited] (KAFKA-16361) Rack aware sticky assignor minQuota violations

2024-03-13 Thread Laymain (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17826042#comment-17826042
 ] 

Laymain edited comment on KAFKA-16361 at 3/13/24 12:52 PM:
---

Hi there, we have the exact same problem here, I was about to open an issue.

If it can help, here some log involving only two hosts (i-0da0437e61e61bf88 and 
i-0d2e25eb1aebefab5): [^illegalstateexception.log]


was (Author: laymain):
Hi there, we have the exact same problem here, I was about to open an issue.


If it can help, here some log involving two hosts (i-0da0437e61e61bf88 and 
i-0d2e25eb1aebefab5): [^illegalstateexception.log]

> Rack aware sticky assignor minQuota violations
> --
>
> Key: KAFKA-16361
> URL: https://issues.apache.org/jira/browse/KAFKA-16361
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 3.5.1, 3.7.0, 3.6.1
>Reporter: Luke D
>Priority: Major
> Attachments: illegalstateexception.log
>
>
> In some low topic replication scenarios the rack aware assignment in the 
> StickyAssignor fails to balance consumers to its own expectations and throws 
> an IllegalStateException, commonly crashing the application (depending on 
> application implementation). While uncommon the error is deterministic, and 
> so persists until the replication state changes. 
>  
> We have observed this in the wild in 3.5.1, and 3.6.1. We have reproduced it 
> locally in a test case in 3.6.1 and 3.7.0 (3.5.1 we did not try but likely 
> would also be reproducible there) 
>  
> Here is the error and stack from our test case against 3.7.0
> {code:java}
> We haven't reached the expected number of members with more than the minQuota 
> partitions, but no more partitions to be assigned
> java.lang.IllegalStateException: We haven't reached the expected number of 
> members with more than the minQuota partitions, but no more partitions to be 
> assigned
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.verifyUnfilledMembers(AbstractStickyAssignor.java:820)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor$ConstrainedAssignmentBuilder.build(AbstractStickyAssignor.java:652)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.assignPartitions(AbstractStickyAssignor.java:113)
>     at 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor.assign(AbstractPartitionAssignor.java:91)
>  {code}
> Here is a specific test case from 3.7.0 that fails when passed to 
> StickyAssignor.assign:
> {code:java}
> Cluster(id = cluster-id, nodes = [host-3:1 (id: 4 rack: rack-3), host-3:1 
> (id: 3 rack: rack-3), host-2:1 (id: 2 rack: rack-2), host-1:1 (id: 1 rack: 
> rack-1)], partitions = [Partition(topic = topic_name, partition = 57, leader 
> = 4, replicas = [4], isr = [4], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 90, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 28, leader = 
> 3, replicas = [3], isr = [3], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 53, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 86, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 24, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 49, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 82, leader = 4, replicas = [4,2], isr = [4,2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 20, leader = 
> 2, replicas = [2,1], isr = [2,1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 45, leader = 2, replicas = [2], isr = [2], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 78, leader = 
> 1, replicas = [1], isr = [1], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 16, leader = 4, replicas = [4], isr = [4], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 41, leader = 
> 1, replicas = [1,2], isr = [1,2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 74, leader = 4, replicas = [4,3,1], isr = [4,3,1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 12, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 37, leader = 1, replicas = [1], isr = [1], 
> offlineReplicas = []), Partition(topic = topic_name, partition = 70, leader = 
> 2, replicas = [2], isr = [2], offlineReplicas = []), Partition(topic = 
> topic_name, partition = 8, leader = 4, replicas =