Dennis-Mircea commented on code in PR #1075:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/1075#discussion_r3156286147


##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java:
##########
@@ -107,12 +109,16 @@ public void scale(Context ctx) throws Exception {
         } catch (Throwable e) {
             onError(ctx, autoscalerMetrics, e);
         } finally {
-            try {
-                applyParallelismOverrides(ctx);
-                applyConfigOverrides(ctx);
-            } catch (Exception e) {
-                LOG.error("Error applying overrides.", e);
-                onError(ctx, autoscalerMetrics, e);
+            // Skip applying overrides when the job is not yet in a stable 
running state,
+            // as they may have already been applied during the previous 
scaling cycle.
+            if (!waiting) {
+                try {
+                    applyParallelismOverrides(ctx);
+                    applyConfigOverrides(ctx);

Review Comment:
   You're right, the `if (!waiting)` skip breaks the self-heal that the 
unconditional `finally` invocation was implicitly providing: if anything resets 
`spec.flinkConfiguration` between cycles while the job is mid-upgrade, the next 
reconcile diffs against `lastReconciledSpec` and fires a spurious second 
upgrade.
   
   I've reworked the PR to keep the realizer always invoked from `finally `and 
pushed the idempotency check down into `KubernetesScalingRealizer`, so 
steady-state cycles do no spec writes and emit no debug noise, while drift is 
still self-healed. I also open a JIRA as part of it: 
[FLINK-39564](https://issues.apache.org/jira/browse/FLINK-39564).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to