Dennis-Mircea Ciupitu created FLINK-39564:
---------------------------------------------
Summary: Make ScalingRealizer overrides idempotent
Key: FLINK-39564
URL: https://issues.apache.org/jira/browse/FLINK-39564
Project: Flink
Issue Type: Improvement
Components: Autoscaler, Kubernetes Operator
Affects Versions: kubernetes-operator-1.14.0
Reporter: Dennis-Mircea Ciupitu
Fix For: kubernetes-operator-1.15.0
Today {{JobAutoScalerImpl#scale}} invokes {{applyParallelismOverrides}} and
{{applyConfigOverrides}} on every reconcile cycle from its {{finally}} block.
Both paths unconditionally rewrite {{spec.flinkConfiguration}} and emit a
{{LOG.debug("Applying ... overrides")}} line, even when the spec already
reflects the autoscaler's persisted decision. This produces redundant in-memory
mutations of the JOSDK informer-cached resource and noisy logs in steady state.
This ticket proposes pushing the redundancy check down into
{{{}KubernetesScalingRealizer{}}}:
* {{realizeParallelismOverrides}} short-circuits when
{{spec.flinkConfiguration[parallelism.overrides]}} already equals the override
string the autoscaler would produce.
* {{realizeConfigOverrides}} short-circuits when every removal target is
already absent and every override entry already matches the spec value, and
avoids churning {{taskManager.resource.memory}} when total memory tuning
produced the same target as the current value.
* The {{LOG.debug("Applying ... overrides")}} lines move into the realizer so
they fire 1:1 with real spec mutations.
* The orchestrator ({{{}JobAutoScalerImpl#scale{}}}) keeps invoking the
realizer unconditionally from its {{finally}} block, preserving the
self-healing invariant against in-flight upgrades that may externally reset
{{spec.flinkConfiguration}} (last-state recovery, blue/green promotion, manual
{{{}kubectl edit{}}}, JOSDK informer cache lag).
No behavioral change is visible to end users as the same overrides land in the
same place. The improvement is internal: fewer NOOP writes, cleaner debug logs,
and a clearer division of responsibilities between orchestrator (always
project) and realizer (only mutate on drift).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)