Dennis-Mircea Ciupitu created FLINK-39564:
---------------------------------------------

             Summary: Make ScalingRealizer overrides idempotent
                 Key: FLINK-39564
                 URL: https://issues.apache.org/jira/browse/FLINK-39564
             Project: Flink
          Issue Type: Improvement
          Components: Autoscaler, Kubernetes Operator
    Affects Versions: kubernetes-operator-1.14.0
            Reporter: Dennis-Mircea Ciupitu
             Fix For: kubernetes-operator-1.15.0


Today {{JobAutoScalerImpl#scale}} invokes {{applyParallelismOverrides}} and 
{{applyConfigOverrides}} on every reconcile cycle from its {{finally}} block. 
Both paths unconditionally rewrite {{spec.flinkConfiguration}} and emit a 
{{LOG.debug("Applying ... overrides")}} line, even when the spec already 
reflects the autoscaler's persisted decision. This produces redundant in-memory 
mutations of the JOSDK informer-cached resource and noisy logs in steady state.

This ticket proposes pushing the redundancy check down into 
{{{}KubernetesScalingRealizer{}}}:
 * {{realizeParallelismOverrides}} short-circuits when 
{{spec.flinkConfiguration[parallelism.overrides]}} already equals the override 
string the autoscaler would produce.
 * {{realizeConfigOverrides}} short-circuits when every removal target is 
already absent and every override entry already matches the spec value, and 
avoids churning {{taskManager.resource.memory}} when total memory tuning 
produced the same target as the current value.
 * The {{LOG.debug("Applying ... overrides")}} lines move into the realizer so 
they fire 1:1 with real spec mutations.
 * The orchestrator ({{{}JobAutoScalerImpl#scale{}}}) keeps invoking the 
realizer unconditionally from its {{finally}} block, preserving the 
self-healing invariant against in-flight upgrades that may externally reset 
{{spec.flinkConfiguration}} (last-state recovery, blue/green promotion, manual 
{{{}kubectl edit{}}}, JOSDK informer cache lag).

No behavioral change is visible to end users as the same overrides land in the 
same place. The improvement is internal: fewer NOOP writes, cleaner debug logs, 
and a clearer division of responsibilities between orchestrator (always 
project) and realizer (only mutate on drift).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to