[
https://issues.apache.org/jira/browse/FLINK-39243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yanis Djeridi updated FLINK-39243:
----------------------------------
Description:
h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when
suspended*
When a FlinkDeployment resource is created with {{{}spec.job.state:
suspended{}}}, the Flink Kubernetes Operator does not update the
{{status.observedGeneration}} field or other status fields. This violates
Kubernetes API conventions and breaks integration with standard deployment
tools like Kapp that rely on {{observedGeneration}} to determine when a
controller has processed a spec change, leading such tools to hang indefinitely.
h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at
all*
The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}}
field in its status at all, meaning deployment tools can never determine
whether the BlueGreen controller has processed a given spec generation,
regardless of state.
h3. *Root Cause*
{+}FlinkDeployment{+}:
In the reconciliation logic for FlinkDeployment, when the operator detects a
first deployment with spec.job.state: suspended, it returns early without
updating any status fields as seen here.
This results in: status.observedGeneration is never set
* status.reconciliationStatus.lastReconciledSpec is never set
* status.lifecycleState remains empty instead of showing SUSPENDED
* isBeforeFirstDeployment() returns true on every reconciliation loop
{+}FlinkBlueGreenDeployment{+}:
FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in its
status class. Additionally, when InitializingBlueStateHandler blocks on a
suspended initial state, it does not record lastReconciledSpec.
h3. *Expected Behaviour*
{+}FlinkDeployment{+}:
When a FlinkDeployment is created with spec.job.state: suspended, the operator
should acknowledge the spec without deploying any Flink resources (no JM pods,
no TM pods). Specifically: status.observedGeneration should be set to match
metadata.generation, signalling that the operator has processed the spec
* status.reconciliationStatus.lastReconciledSpec should be recorded with
state: SUSPENDED
* status.lifecycleState should show SUSPENDED
* A subsequent change to spec.job.state: running should trigger a normal first
deployment
+FlinkBlueGreenDeployment:+
FlinkBlueGreenDeploymentStatus should include an observedGeneration field, set
on every status update
* lastReconciledSpec should be recorded when blocking on a suspended initial
state
* A subsequent change to spec.job.state: running should trigger deployment
correctly
was:
h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when
suspended*
When a FlinkDeployment resource is created with {{{}spec.job.state:
suspended{}}}, the Flink Kubernetes Operator does not update the
{{status.observedGeneration}} field or other status fields. This violates
Kubernetes API conventions and breaks integration with standard deployment
tools like Kapp that rely on {{observedGeneration}} to determine when a
controller has processed a spec change, leading such tools to hang indefinitely.
h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at
all*
The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}}
field in its status at all, meaning deployment tools can never determine
whether the BlueGreen controller has processed a given spec generation,
regardless of state.
h3. *Root Cause*
{+}FlinkDeployment{+}:
In the reconciliation logic for FlinkDeployment, when the operator detects a
first deployment with spec.job.state: suspended, it returns early without
updating any status fields as seen here.
This results in: * status.observedGeneration is never set
* status.reconciliationStatus.lastReconciledSpec is never set
* status.lifecycleState remains empty instead of showing SUSPENDED
* isBeforeFirstDeployment() returns true on every reconciliation loop
{+}FlinkBlueGreenDeployment{+}:
FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in its
status class. Additionally, when InitializingBlueStateHandler blocks on a
suspended initial state, it does not record lastReconciledSpec.
h3. *Expected Behaviour*
{+}FlinkDeployment{+}:
When a FlinkDeployment is created with spec.job.state: suspended, the operator
should acknowledge the spec without deploying any Flink resources (no JM pods,
no TM pods, no services). Specifically: * status.observedGeneration should be
set to match metadata.generation, signaling that the operator has processed the
spec
* status.reconciliationStatus.lastReconciledSpec should be recorded with
state: SUSPENDED
* status.lifecycleState should show SUSPENDED
* A subsequent change to spec.job.state: running should trigger a normal first
deployment
+FlinkBlueGreenDeployment:+
* FlinkBlueGreenDeploymentStatus should include an observedGeneration field,
set on every status update
* lastReconciledSpec should be recorded when blocking on a suspended initial
state
* A subsequent change to spec.job.state: running should trigger deployment
correctly
> Include `observedGeneration` for Suspended Flink Deployments
> ------------------------------------------------------------
>
> Key: FLINK-39243
> URL: https://issues.apache.org/jira/browse/FLINK-39243
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.14.0
> Reporter: Yanis Djeridi
> Priority: Major
> Fix For: kubernetes-operator-1.14.0
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when
> suspended*
> When a FlinkDeployment resource is created with {{{}spec.job.state:
> suspended{}}}, the Flink Kubernetes Operator does not update the
> {{status.observedGeneration}} field or other status fields. This violates
> Kubernetes API conventions and breaks integration with standard deployment
> tools like Kapp that rely on {{observedGeneration}} to determine when a
> controller has processed a spec change, leading such tools to hang
> indefinitely.
> h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at
> all*
> The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}}
> field in its status at all, meaning deployment tools can never determine
> whether the BlueGreen controller has processed a given spec generation,
> regardless of state.
> h3. *Root Cause*
> {+}FlinkDeployment{+}:
> In the reconciliation logic for FlinkDeployment, when the operator detects a
> first deployment with spec.job.state: suspended, it returns early without
> updating any status fields as seen here.
> This results in: status.observedGeneration is never set
> * status.reconciliationStatus.lastReconciledSpec is never set
> * status.lifecycleState remains empty instead of showing SUSPENDED
> * isBeforeFirstDeployment() returns true on every reconciliation loop
>
> {+}FlinkBlueGreenDeployment{+}:
> FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in
> its status class. Additionally, when InitializingBlueStateHandler blocks on a
> suspended initial state, it does not record lastReconciledSpec.
>
> h3. *Expected Behaviour*
>
> {+}FlinkDeployment{+}:
> When a FlinkDeployment is created with spec.job.state: suspended, the
> operator should acknowledge the spec without deploying any Flink resources
> (no JM pods, no TM pods). Specifically: status.observedGeneration should be
> set to match metadata.generation, signalling that the operator has processed
> the spec
> * status.reconciliationStatus.lastReconciledSpec should be recorded with
> state: SUSPENDED
> * status.lifecycleState should show SUSPENDED
> * A subsequent change to spec.job.state: running should trigger a normal
> first deployment
>
> +FlinkBlueGreenDeployment:+
> FlinkBlueGreenDeploymentStatus should include an observedGeneration field,
> set on every status update
> * lastReconciledSpec should be recorded when blocking on a suspended initial
> state
> * A subsequent change to spec.job.state: running should trigger deployment
> correctly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)