Jochen Rauschenbusch created KAFKA-20198:
--------------------------------------------
Summary: StickyPartitionAssignor with group protocol classic is
not acting sticky
Key: KAFKA-20198
URL: https://issues.apache.org/jira/browse/KAFKA-20198
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 4.1.1
Reporter: Jochen Rauschenbusch
Attachments: HATaskAssignorLogs.json, StickyTaskAssignorLogs.json
Problem:
During some tests, I noticed that many state stores were closed during group
rebalancing triggered by instance scaling. I assumed that the
StickyTaskAssignor was supposed to prevent exactly this. However, with each new
application instance that started the stream, the rebalancing resulted in a
cascade of "Handle new assignments" log entries. Scaling from one to two
application instances (each with ten Kafka stream threads) generated 429 such
entries, which seems excessive. The log entries showed that almost all tasks
were moved to other group members throughout the entire rebalancing phase.
Setup:
- Scala application based on Scala 2.13 and Kafka Streams
- Application consumes from a single topic having 450 Partitions
- Stream topology is implementing some stateful aggregations
- Change logging is disabled. Only InMemory state stores are used.
- Each app instance is configured to create 10 Stream Threads
Following libraries are used:
```
org.apache.kafka:kafka-streams:4.2.0
org.apache.kafka:kafka-streams-scala_2.13:4.2.0
org.apache.kafka:kafka-streams-test-utils:4.2.0
```
The Kafka Cluster based on v4.1.0 was created with Strimzi Operator v0.50.0.
I already discussed this behavior with [~lucasbru] and it seems to be a bug:
[https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1770905604912249]
Having implemented a pretty simple Spring Boot app with an absolut minimal
topology revealed the same behavior. The topology in this case didn't used
state stores at all. It just consumes from a single topic (again 450
partitions) and does some logging of the key/value combinations. Also here the
rebalancing led to a cascade of task re-assignments. Again i configured the app
to use 10 Stream Threads.
I also did another Tests with the HATaskAssignor. Here the logic seems to 1st
revoke all assigned partitions and then re-assigns the tasks in a round-robin
manner, which seems to be as expected.
Another test using KIP-1071 showed that there the Sticky Task assignment works
as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)