[
https://issues.apache.org/jira/browse/FLINK-39564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis-Mircea Ciupitu closed FLINK-39564.
-----------------------------------------
Resolution: Invalid
The current behavior is fine, only it should be covered with UTs. For this, a
Jira is not needed.
> Make ScalingRealizer overrides idempotent
> -----------------------------------------
>
> Key: FLINK-39564
> URL: https://issues.apache.org/jira/browse/FLINK-39564
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler, Kubernetes Operator
> Affects Versions: kubernetes-operator-1.14.0
> Reporter: Dennis-Mircea Ciupitu
> Priority: Major
> Labels: pull-request-available
> Fix For: kubernetes-operator-1.15.0
>
>
> Today {{JobAutoScalerImpl#scale}} invokes {{applyParallelismOverrides}} and
> {{applyConfigOverrides}} on every reconcile cycle from its {{finally}} block.
> Both paths unconditionally rewrite {{spec.flinkConfiguration}} and emit a
> {{LOG.debug("Applying ... overrides")}} line, even when the spec already
> reflects the autoscaler's persisted decision. This produces redundant
> in-memory mutations of the JOSDK informer-cached resource and noisy logs in
> steady state.
> This ticket proposes pushing the redundancy check down into
> {{{}KubernetesScalingRealizer{}}}:
> * {{realizeParallelismOverrides}} short-circuits when
> {{spec.flinkConfiguration[parallelism.overrides]}} already equals the
> override string the autoscaler would produce.
> * {{realizeConfigOverrides}} short-circuits when every removal target is
> already absent and every override entry already matches the spec value, and
> avoids churning {{taskManager.resource.memory}} when total memory tuning
> produced the same target as the current value.
> * The {{LOG.debug("Applying ... overrides")}} lines move into the realizer
> so they fire 1:1 with real spec mutations.
> * The orchestrator ({{{}JobAutoScalerImpl#scale{}}}) keeps invoking the
> realizer unconditionally from its {{finally}} block, preserving the
> self-healing invariant against in-flight upgrades that may externally reset
> {{spec.flinkConfiguration}} (last-state recovery, blue/green promotion,
> manual {{{}kubectl edit{}}}, JOSDK informer cache lag).
> No behavioral change is visible to end users as the same overrides land in
> the same place. The improvement is internal: fewer NOOP writes, cleaner debug
> logs, and a clearer division of responsibilities between orchestrator (always
> project) and realizer (only mutate on drift).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)