[ 
https://issues.apache.org/jira/browse/FLINK-39243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanis Djeridi updated FLINK-39243:
----------------------------------
    Description: 
h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when 
suspended*

When a FlinkDeployment resource is created with {{{}spec.job.state: 
suspended{}}}, the Flink Kubernetes Operator does not update the 
{{status.observedGeneration}} field or other status fields. This violates 
Kubernetes API conventions and breaks integration with standard deployment 
tools like Kapp that rely on {{observedGeneration}} to determine when a 
controller has processed a spec change, leading such tools to hang indefinitely.
h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at 
all*

The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}} 
field in its status at all, meaning deployment tools can never determine 
whether the BlueGreen controller has processed a given spec generation, 
regardless of state.
h3. *Root Cause*

{+}FlinkDeployment{+}:

In the reconciliation logic for FlinkDeployment, when the operator detects a 
first deployment with spec.job.state: suspended, it returns early without 
updating any status fields as seen here.

This results in: status.observedGeneration is never set
 * status.reconciliationStatus.lastReconciledSpec is never set
 * status.lifecycleState remains empty instead of showing SUSPENDED
 * isBeforeFirstDeployment() returns true on every reconciliation loop

 
{+}FlinkBlueGreenDeployment{+}:

FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in its 
status class. Additionally, when InitializingBlueStateHandler blocks on a 
suspended initial state, it does not record lastReconciledSpec.
 
h3. *Expected Behaviour*

 
{+}FlinkDeployment{+}:

When a FlinkDeployment is created with spec.job.state: suspended, the operator 
should acknowledge the spec without deploying any Flink resources (no JM pods, 
no TM pods). Specifically: status.observedGeneration should be set to match 
metadata.generation, signalling that the operator has processed the spec
 * status.reconciliationStatus.lastReconciledSpec should be recorded with 
state: SUSPENDED
 * status.lifecycleState should show SUSPENDED

 * A subsequent change to spec.job.state: running should trigger a normal first 
deployment

 
+FlinkBlueGreenDeployment:+

FlinkBlueGreenDeploymentStatus should include an observedGeneration field, set 
on every status update
 * lastReconciledSpec should be recorded when blocking on a suspended initial 
state

 * A subsequent change to spec.job.state: running should trigger deployment 
correctly

  was:
h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when 
suspended*

When a FlinkDeployment resource is created with {{{}spec.job.state: 
suspended{}}}, the Flink Kubernetes Operator does not update the 
{{status.observedGeneration}} field or other status fields. This violates 
Kubernetes API conventions and breaks integration with standard deployment 
tools like Kapp that rely on {{observedGeneration}} to determine when a 
controller has processed a spec change, leading such tools to hang indefinitely.
h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at 
all*

The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}} 
field in its status at all, meaning deployment tools can never determine 
whether the BlueGreen controller has processed a given spec generation, 
regardless of state.
h3. *Root Cause*

{+}FlinkDeployment{+}:

In the reconciliation logic for FlinkDeployment, when the operator detects a 
first deployment with spec.job.state: suspended, it returns early without 
updating any status fields as seen here.

This results in: * status.observedGeneration is never set
 * status.reconciliationStatus.lastReconciledSpec is never set
 * status.lifecycleState remains empty instead of showing SUSPENDED
 * isBeforeFirstDeployment() returns true on every reconciliation loop

 
{+}FlinkBlueGreenDeployment{+}:

FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in its 
status class. Additionally, when InitializingBlueStateHandler blocks on a 
suspended initial state, it does not record lastReconciledSpec.
 
h3. *Expected Behaviour*
 
{+}FlinkDeployment{+}:

When a FlinkDeployment is created with spec.job.state: suspended, the operator 
should acknowledge the spec without deploying any Flink resources (no JM pods, 
no TM pods, no services). Specifically: * status.observedGeneration should be 
set to match metadata.generation, signaling that the operator has processed the 
spec
 * status.reconciliationStatus.lastReconciledSpec should be recorded with 
state: SUSPENDED
 * status.lifecycleState should show SUSPENDED

 * A subsequent change to spec.job.state: running should trigger a normal first 
deployment

 
+FlinkBlueGreenDeployment:+
  * FlinkBlueGreenDeploymentStatus should include an observedGeneration field, 
set on every status update
 * lastReconciledSpec should be recorded when blocking on a suspended initial 
state

 * A subsequent change to spec.job.state: running should trigger deployment 
correctly


> Include `observedGeneration` for Suspended Flink Deployments
> ------------------------------------------------------------
>
>                 Key: FLINK-39243
>                 URL: https://issues.apache.org/jira/browse/FLINK-39243
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.14.0
>            Reporter: Yanis Djeridi
>            Priority: Major
>             Fix For: kubernetes-operator-1.14.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> h3. *Problem 1: FlinkDeployment — {{observedGeneration}} not updated when 
> suspended*
> When a FlinkDeployment resource is created with {{{}spec.job.state: 
> suspended{}}}, the Flink Kubernetes Operator does not update the 
> {{status.observedGeneration}} field or other status fields. This violates 
> Kubernetes API conventions and breaks integration with standard deployment 
> tools like Kapp that rely on {{observedGeneration}} to determine when a 
> controller has processed a spec change, leading such tools to hang 
> indefinitely.
> h3. *Problem 2: FlinkBlueGreenDeployment — no {{observedGeneration}} field at 
> all*
> The FlinkBlueGreenDeployment resource does not have an {{observedGeneration}} 
> field in its status at all, meaning deployment tools can never determine 
> whether the BlueGreen controller has processed a given spec generation, 
> regardless of state.
> h3. *Root Cause*
> {+}FlinkDeployment{+}:
> In the reconciliation logic for FlinkDeployment, when the operator detects a 
> first deployment with spec.job.state: suspended, it returns early without 
> updating any status fields as seen here.
> This results in: status.observedGeneration is never set
>  * status.reconciliationStatus.lastReconciledSpec is never set
>  * status.lifecycleState remains empty instead of showing SUSPENDED
>  * isBeforeFirstDeployment() returns true on every reconciliation loop
>  
> {+}FlinkBlueGreenDeployment{+}:
> FlinkBlueGreenDeploymentStatus does not have an observedGeneration field in 
> its status class. Additionally, when InitializingBlueStateHandler blocks on a 
> suspended initial state, it does not record lastReconciledSpec.
>  
> h3. *Expected Behaviour*
>  
> {+}FlinkDeployment{+}:
> When a FlinkDeployment is created with spec.job.state: suspended, the 
> operator should acknowledge the spec without deploying any Flink resources 
> (no JM pods, no TM pods). Specifically: status.observedGeneration should be 
> set to match metadata.generation, signalling that the operator has processed 
> the spec
>  * status.reconciliationStatus.lastReconciledSpec should be recorded with 
> state: SUSPENDED
>  * status.lifecycleState should show SUSPENDED
>  * A subsequent change to spec.job.state: running should trigger a normal 
> first deployment
>  
> +FlinkBlueGreenDeployment:+
> FlinkBlueGreenDeploymentStatus should include an observedGeneration field, 
> set on every status update
>  * lastReconciledSpec should be recorded when blocking on a suspended initial 
> state
>  * A subsequent change to spec.job.state: running should trigger deployment 
> correctly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to