Hi! What Flink version are you using? The operator always sets: execution.shutdown-on-application-finish to false so that finished / failed application clusters should not exit immediately and we can observe them.
This is however only available in Flink 1.15 and above. Cheers, Gyula On Thu, Dec 7, 2023 at 9:15 AM richard.su <richardsuc...@gmail.com> wrote: > Hi, Community, I had found out this issue, but I'm not sure this issue > have any solution. I have tried flink operator 1.6, which this issue is > still exist. > > If not, I think this could create a jira issue to following. > > When we create a bounded streaming jobs which will finally to become > Finished status, after this job's status from Running to Finished, flink > will shut down kubernetes cluster, at code of flink-kubernetes package, > class KubernetesResourceManagerDriver's method deregisterApplication, which > will delete jm deployment directly in a second (in our env). > But our operator config, when jm deployment status is Ready and not in > savepoint progress, this observer interval is 15s, which means operator > will never observe the job status changing. > So if the job is failed not finished, we cannot distinguish this. All we > known is Jm deployment is Missing and Job status is Reconciling. > We want to using flink operator integrating into our platform, but it > cannot monitor job real status, which is wired. > > May be it till related to the clean logic of flink native mode, from my > side, operator side is hard to deal with such situation because we cannot > directly get the exit code of container when pod is missing and jm > deployment is missing. > > Thanks to your time to read this issue. > Richard Su > > > > 2023年12月6日 13:34,richard.su <richardsuc...@gmail.com> 写道: > > > > For more information to produce this problem, > > > > version: flink operator 1.4 > > mode: native > > job: wordcount > > language: java > > type: FlinkDeployment > > > >> 2023年12月6日 10:52,richard.su <richardsuc...@gmail.com> 写道: > >> > >> Hi Community, the default configuration of flink operator is: > >> > >> kubernetes.operator.reconcile.interval: 15s > >> kubernetes.operator.observer.progress-check.interval: 5s > >> > >> when a bounded streaming job already stays in stop or error status, jm > deployment will stay to be missing, if I set configuration: > >> > >> kubernetes.operator.jm-deployment-recover.enabled: false > >> > >> then, flink operator can only observe the job status at Recociling and > jm deployment status at Missing > >> > >> we cannot check whether the flink job is finished or error, because of > in the interval of observer.progress-check, flink web ui is already down. > >> > >> so, we hope someone in community could show a way to monitor bounded > steaming job's status. > >> > >> Thanks. > >> > >> Richard Su > > > >