X-czh commented on code in PR #586:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1186953244


##########
flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java:
##########
@@ -87,28 +88,28 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
     public static final ConfigOption<Integer> VERTEX_MAX_PARALLELISM =
             autoScalerConfig("vertex.max-parallelism")
                     .intType()
-                    .defaultValue(Integer.MAX_VALUE)
+                    .defaultValue(200)
                     .withDescription(
                             "The maximum parallelism the autoscaler can use. 
Note that this limit will be ignored if it is higher than the max parallelism 
configured in the Flink config or directly on each operator.");
 
     public static final ConfigOption<Double> MAX_SCALE_DOWN_FACTOR =
             autoScalerConfig("scale-down.max-factor")
                     .doubleType()
-                    .defaultValue(0.6)
+                    .defaultValue(1.0)

Review Comment:
   Curious why we choose to loose it to 1.0. We found in that TPR tends to be 
overestimated a lot and leading to overly aggressive downscaling when:
   
   - the pipeline is underloaded and far from optimal.
   - Avg CPU allocated per slot is < 1.
   
   The reason is that linear scaling with busy time metrics assumes no resource 
competition between tasks when we pushing up the loads, however, when avg CPU 
allocated per slot is small, resource competition beween tasks will be more and 
more severve as we pushing up the overall loads of the pipeline.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to