[ https://issues.apache.org/jira/browse/FLINK-35594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gyula Fora closed FLINK-35594. ------------------------------ Resolution: Duplicate > Downscaling doesn't release TaskManagers. > ----------------------------------------- > > Key: FLINK-35594 > URL: https://issues.apache.org/jira/browse/FLINK-35594 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.18.1 > Environment: * Flink 1.18.1 (Java 11, Temurin). > * Kubernetes Operator 1.8 > * Kubernetes version v1.28.9-eks-036c24b (AWS EKS). > > Autoscaling configuration: > {code:java} > jobmanager.scheduler: adaptive > job.autoscaler.enabled: "true" > job.autoscaler.metrics.window: 15m > job.autoscaler.stabilization.interval: 15m > job.autoscaler.scaling.effectiveness.threshold: 0.2 > job.autoscaler.target.utilization: "0.75" > job.autoscaler.target.utilization.boundary: "0.25" > job.autoscaler.metrics.busy-time.aggregator: "AVG" > job.autoscaler.restart.time-tracking.enabled: "true"{code} > Reporter: Aviv Dozorets > Priority: Major > Attachments: Screenshot 2024-06-10 at 12.50.37 PM.png > > > (Follow-up of Slack conversation on #troubleshooting channel). > Recently I've observed a behavior, that should be improved: > A Flink DataStream that runs with autoscaler (backed by Kubernetes operator) > and Adaptive scheduler doesn't release a node (TaskManager) when scaling > down. In my example job started with initial parallelism of 64, while having > 4 TM with 16 cores each (1:1 core:slot) and scaled down to 16. > My expectation: 1 TaskManager should be up and running. > Reality: All 4 initial TaskManagers are running, with multiple and unequal > amount of available slots. > > Didn't find an existing configuration to change the behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)