X-czh commented on code in PR #586: URL: https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1186953244
########## flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java: ########## @@ -87,28 +88,28 @@ private static ConfigOptions.OptionBuilder autoScalerConfig(String key) { public static final ConfigOption<Integer> VERTEX_MAX_PARALLELISM = autoScalerConfig("vertex.max-parallelism") .intType() - .defaultValue(Integer.MAX_VALUE) + .defaultValue(200) .withDescription( "The maximum parallelism the autoscaler can use. Note that this limit will be ignored if it is higher than the max parallelism configured in the Flink config or directly on each operator."); public static final ConfigOption<Double> MAX_SCALE_DOWN_FACTOR = autoScalerConfig("scale-down.max-factor") .doubleType() - .defaultValue(0.6) + .defaultValue(1.0) Review Comment: Curious why we choose to loose it to 1.0. We found in that TPR tends to be overestimated a lot and leading to overly aggressive downscaling when: - the pipeline is underloaded and far from optimal. - Avg CPU allocated per slot is < 1. The reason is that linear scaling with busy time metrics assumes no resource competition between tasks when we pushing up the loads, however, when avg CPU allocated per slot is small, resource competition beween tasks will be more and more severve as we pushing up the overall loads of the pipeline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org