trystanj commented on code in PR #586: URL: https://github.com/apache/flink-kubernetes-operator/pull/586#discussion_r1583828939
########## flink-kubernetes-operator-autoscaler/src/main/java/org/apache/flink/kubernetes/operator/autoscaler/config/AutoScalerOptions.java: ########## @@ -68,15 +68,16 @@ private static ConfigOptions.OptionBuilder autoScalerConfig(String key) { public static final ConfigOption<Double> TARGET_UTILIZATION_BOUNDARY = autoScalerConfig("target.utilization.boundary") .doubleType() - .defaultValue(0.1) + .defaultValue(0.4) Review Comment: @mxm your illustration was helpful. I'm having a hard time understanding why the autoscaler makes the decisions it makes. The logs, metrics, and definitions of terms are somewhat vague (e.g. I never knew what was meant by "true processing rate" until I saw you explain it here). I can open a new ticket to document and perhaps work on it myself, because I think these are critical definitions 😄 But is source backlog considered in this as well? I have a source which is lagging quite substantially, but never seems to scale out because the "true processing rate" suggests it should be keeping up, but it never does. The catch-up duration is having no effect, either. At the root of what I am asking is: even if the "true" rate is _calculated_ to be sufficient to catch up, _but it isn't actually catching up_ (let's say because a source is doing something inefficient with its kafka poll settings)... is there any mechanism in place to detect this and trigger a scale out of that source vertex? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org