Xin Gao created FLINK-39981:
-------------------------------
Summary: Autoscaler cannot trigger source rebalance after active
split removal on dynamic Kafka source
Key: FLINK-39981
URL: https://issues.apache.org/jira/browse/FLINK-39981
Project: Flink
Issue Type: Improvement
Components: Autoscaler, Kubernetes Operator
Reporter: Xin Gao
The autoscaler derives Kafka source partition count by enumerating metric names
matching:
KafkaSourceReader.topic.<topic>.partition.<id>.currentOffset
After DynamicKafkaSource removes Kafka clusters/topics/splits due to metadata
changes, removed split metric names may remain visible through the aggregated
metrics endpoint. The autoscaler therefore continues counting removed
partitions as active partitions.
*Example:*
- Source parallelism: 6
- Active Kafka partitions before metadata update: 8
- Metadata update removes a Kafka cluster containing 5 partitions
- Active Kafka partitions after update: 3
- Removed partition currentOffset metric names remain visible
- Autoscaler still calculates numSourcePartitions = 8
- Source parallelism remains 6
*Expected behavior:*
The autoscaler should calculate numSourcePartitions from currently {*}active
Kafka splits only{*}. After the metadata update, numSourcePartitions should
become 3 and source parallelism should be capped to <= 3 on the next autoscaler
evaluation.
*Actual behavior:*
The autoscaler continues counting stale currentOffset metric names and does not
scale down. The source remains {*}over-parallelized with empty subtasks{*}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)