Maximilian Michels created FLINK-33710:
------------------------------------------
Summary: Autoscaler redeploys pipeline for a NOOP parallelism
change
Key: FLINK-33710
URL: https://issues.apache.org/jira/browse/FLINK-33710
Project: Flink
Issue Type: Bug
Components: Autoscaler, Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0, kubernetes-operator-1.6.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Fix For: kubernetes-operator-1.8.0
The operator supports two modes to apply autoscaler changes:
# Use the internal Flink config {{pipeline.jobvertex-parallelism-overrides}}
# Make use of Flink's Rescale API
For (1), a string has to be generated for the Flink config with the actual
overrides. This string has to be deterministic for a given map. But it is not.
Consider the following observed log:
{noformat}
>>> Event | Info | SPECCHANGED | SCALE change(s) detected (Diff:
FlinkDeploymentSpec[flinkConfiguration.pipeline.jobvertex-parallelism-overrides
:
92542d1280187bd464274368a5f86977:3,9f979ed859083299d29f281832cb5be0:1,84881d7bda0dc3d44026e37403420039:1,1652184ffd0522859c7840a24936847c:1
->
9f979ed859083299d29f281832cb5be0:1,84881d7bda0dc3d44026e37403420039:1,92542d1280187bd464274368a5f86977:3,1652184ffd0522859c7840a24936847c:1]),
starting reconciliation.
{noformat}
The overrides are identical but the order is different which triggers a
redeploy. This does not seem to happen often but some deterministic string
generation (e.g. sorting by key) is required to prevent any NOOP updates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)