Ahmed Hamdy created FLINK-31998: ----------------------------------- Summary: Flink Operator Deadlock on run job Failure Key: FLINK-31998 URL: https://issues.apache.org/jira/browse/FLINK-31998 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.4.0, kubernetes-operator-1.3.0, kubernetes-operator-1.2.0 Reporter: Ahmed Hamdy Fix For: kubernetes-operator-1.5.0 Attachments: gleek-m6pLe3Wy--IpCKQavAQwBQ.png
h2. Description FlinkOperator Reconciler goes into deadlock situation where it never udpates Session job to DEPLOYED if {{deploy}} fails. Attached sequence diagram of the issue where FlinkSessionJob is stuck in UPGRADING indefinitely. h2. proposed fix Reconciler should roll back changes CR if {{reconciliationStatus.isBeforeFirstDeployment()}} fails to {{deploy()}}. [diagram|https://issues.apache.org/7239bb39-60d8-48a0-9052-f3231947edbe] -- This message was sent by Atlassian Jira (v8.20.10#820010)