Gyula Fora created FLINK-26345:
----------------------------------
Summary: Observer should detect flink job even if deployment
status is empty
Key: FLINK-26345
URL: https://issues.apache.org/jira/browse/FLINK-26345
Project: Flink
Issue Type: Bug
Reporter: Gyula Fora
Currently it is possible to get into a cornercase where the job is submitted by
the reconciler but the deployment status is not updated to reflect the
submission.
In these cases the observer does not attempt to "recover" the cluster, it
simply skips the observation step, thinking that the job is not running (status
== null).
However this means that the reconciler will try to submit it again leading to
the error:
{code:java}
org.apache.flink.client.deployment.ClusterDeploymentException: The Flink
cluster job-name already exists.
at
org.apache.flink.kubernetes.KubernetesClusterDescriptor.deployApplicationCluster(KubernetesClusterDescriptor.java:179)
at
org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
at
org.apache.flink.kubernetes.operator.service.FlinkService.submitApplicationCluster(FlinkService.java:73)
at
org.apache.flink.kubernetes.operator.reconciler.JobReconciler.deployFlinkJob(JobReconciler.java:123)
at
org.apache.flink.kubernetes.operator.reconciler.JobReconciler.reconcile(JobReconciler.java:65)
at
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcileFlinkDeployment(FlinkDeploymentController.java:126)
at
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:102)
at
org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:51)
{code}
This is somewhat related to FLINK-26261, cc [~thw]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)