Dennis-Mircea commented on code in PR #1075:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/1075#discussion_r3156286147
##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##########
@@ -107,12 +109,16 @@ public void scale(Context ctx) throws Exception {
} catch (Throwable e) {
onError(ctx, autoscalerMetrics, e);
} finally {
- try {
- applyParallelismOverrides(ctx);
- applyConfigOverrides(ctx);
- } catch (Exception e) {
- LOG.error("Error applying overrides.", e);
- onError(ctx, autoscalerMetrics, e);
+ // Skip applying overrides when the job is not yet in a stable
running state,
+ // as they may have already been applied during the previous
scaling cycle.
+ if (!waiting) {
+ try {
+ applyParallelismOverrides(ctx);
+ applyConfigOverrides(ctx);
Review Comment:
You're right, the `if (!waiting)` skip breaks the self-heal that the
unconditional `finally` invocation was implicitly providing: if anything resets
`spec.flinkConfiguration` between cycles while the job is mid-upgrade, the next
reconcile diffs against `lastReconciledSpec` and fires a spurious second
upgrade.
I've reworked the PR to keep the realizer always invoked from `finally `and
pushed the idempotency check down into `KubernetesScalingRealizer`, so
steady-state cycles do no spec writes and emit no debug noise, while drift is
still self-healed. I also open a JIRA as part of it:
[FLINK-39564](https://issues.apache.org/jira/browse/FLINK-39564).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]