[ https://issues.apache.org/jira/browse/FLINK-31998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17722434#comment-17722434 ]
Zhenqiu Huang commented on FLINK-31998: --------------------------------------- [~gyfora] Technically, if a session job is created, it is actually a session cluster that can run multiple jobs in parallel or sequentially. But from session job CRD, the cluster to job mapping is 1 to 1. We probably need to adjust the CRD to decouple the job status and session cluster status. > Flink Operator Deadlock on run job Failure > ------------------------------------------ > > Key: FLINK-31998 > URL: https://issues.apache.org/jira/browse/FLINK-31998 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Affects Versions: kubernetes-operator-1.2.0, kubernetes-operator-1.3.0, > kubernetes-operator-1.4.0 > Reporter: Ahmed Hamdy > Priority: Major > Fix For: kubernetes-operator-1.6.0 > > Attachments: gleek-m6pLe3Wy--IpCKQavAQwBQ.png > > > h2. Description > FlinkOperator Reconciler goes into deadlock situation where it never udpates > Session job to DEPLOYED/ROLLED_BACK if {{deploy}} fails. > Attached sequence diagram of the issue where FlinkSessionJob is stuck in > UPGRADING indefinitely. > h2. proposed fix > Reconciler should roll back changes CR if > {{reconciliationStatus.isBeforeFirstDeployment()}} fails to {{{}deploy(){}}}. > [diagram|https://issues.apache.org/7239bb39-60d8-48a0-9052-f3231947edbe] -- This message was sent by Atlassian Jira (v8.20.10#820010)