[
https://issues.apache.org/jira/browse/FLINK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyula Fora closed FLINK-38517.
------------------------------
Fix Version/s: kubernetes-operator-1.14.0
Assignee: Daniel Rossos
Resolution: Fixed
merged to main c479a460b1d9f62d685d2d6208ec275fef8938ce
> Unable to Transition to Green for FlinkBlueGreenDeployment with "Ignored
> Fields"
> --------------------------------------------------------------------------------
>
> Key: FLINK-38517
> URL: https://issues.apache.org/jira/browse/FLINK-38517
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Reporter: Daniel Rossos
> Assignee: Daniel Rossos
> Priority: Major
> Labels: pull-request-available
> Fix For: kubernetes-operator-1.14.0
>
>
> Found an issue with the newly added BlueGreen phase 1 deployment in the Flink
> Kuberentes operator. Merged
> [PR|https://github.com/apache/flink-kubernetes-operator/pull/969]
> TL;DR is when we clear the "Ignored Fields" (using
> ReflectiveDiffBuilder.clearIgnoredFields()) when doing a check between our
> FlinkBlueGreen template's spec and the spec of the Green-FlinkDeployment, we
> strip these fields, I think unintentionally, after we do the equality check.
> This will cause equality check to fail if only difference is these ignored
> fields.
> Longer breakdown of the trace / process we found in this loop:
> Flow and exact areas where this issue becomes evident:
> 1. You create a FlinkBlueGreenDeployment with fields that would be cleared by
> ReflectiveDiffBuilder.clearIgnoredFields(). In our example
> `spec.template.spec.podTemplate.apiVersion` and
> `spec.template.spec.podTemplate.kind`. After then having your
> Blue-FlinkDeployment enter a stable state (which will then set the
> `lastReconciledSpec` json-string to the proper FlinkBlueGreenDeployment spec
> including the additionalProperties), trigger a Green-FlinkDeployment to spin
> up by making a spec change to the FlinkBlueGreenDeployment
> 2. As part of this transition,
> [`TransitioningStateHandler.handle`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/bluegreen/handlers/TransitioningStateHandler.java#L38]
> will then call
> [`BlueGreenDeploymentService.handleSpecChangesDuringTransition`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/bluegreen/BlueGreenDeploymentService.java#L400]
> which will then attempt to see if there is any spec changes by calling
> [`FlinkBlueGreenDeploymentSpecDiff.compare`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/diff/FlinkBlueGreenDeploymentSpecDiff.java#L58].
> 3. `FlinkBlueGreenDeploymentSpecDiff.compare` will first check if the
> FlinkBlueGreenDeployment spec from the CR is equal to the lastReconciledState
> spec yaml which is stored as a field yaml-string and updated based on last
> resolution attempt. This will return false because of the changes you just
> made to the FlinkBlueGreenDeploymentSpec that triggered this Blue-Green
> transition
> 4. This will then trigger a DiffResult to be generated using
> `ReflectiveDiffBuilder`, which as part of the constructor calls
> [`ReflectiveDiffBuilder.clearIgnoredFields()`|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/diff/ReflectiveDiffBuilder.java#L192]
> which removes all addititionalProperties from both your left and right
> specs. Note this strip happens _after_ the equality check.
> 5. The spec that is supposed to now represent the FlinkBlueGreenDeployment
> spec will now be written to the lastReconciledSpec but now is critically
> missing the additionalProperties field.
> 6. n-th loop triggers of monitoring transition after the
> Green-FlinkDeployment is in a totally healthy running state, but the
> `FlinkBlueGreenDeploymentSpecDiff.compare` will constantly be comparing the
> actual FlinkBlueGreenDeployment spec which contains additionalProperties and
> the lastReconciledSpec that has had them stripped.
> 7. This causes infinite loop / never reconciling state on transition and
> breaks the entire FlinkBlueGreenDeployment flow
> A fix / way to show this is the case is by moving the call to create the
> ReflectiveDiffBuilder to before the line:
> ```
> boolean specsEqual = leftSpec.equals(rightSpec);
> ```
> Then everything works as expected because we update the specs to remove
> ignored fields before checking if they are equal. I am interested to hear if
> there are other / more time efficient alternative solutions
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)