Re: [DISCUSSION] Consider Flink operator having a way to monitor the status of bounded streaming jobs after they finish or error?

Gyula Fóra Thu, 07 Dec 2023 00:23:26 -0800

Hi!

What Flink version are you using?
The operator always sets: execution.shutdown-on-application-finish to false
so that finished / failed application clusters should not exit immediately
and we can observe them.


This is however only available in Flink 1.15 and above.

Cheers,
Gyula

On Thu, Dec 7, 2023 at 9:15 AM richard.su <richardsuc...@gmail.com> wrote:

> Hi, Community, I had found out this issue, but I'm not sure this issue
> have any solution. I have tried flink operator 1.6, which this issue is
> still exist.
>
> If not, I think this could create a jira issue to following.
>
> When we create a bounded streaming jobs which will finally to become
> Finished status, after this job's status from Running to Finished, flink
> will shut down kubernetes cluster, at code of flink-kubernetes package,
> class KubernetesResourceManagerDriver's method deregisterApplication, which
> will delete jm deployment directly in a second (in our env).
> But our operator config, when jm deployment status is Ready and not in
> savepoint progress, this observer interval is 15s, which means operator
> will never observe the job status changing.
> So if the job is failed not finished, we cannot distinguish this. All we
> known is Jm deployment is Missing and Job status is Reconciling.
> We want to using flink operator integrating into our platform, but it
> cannot monitor job real status, which is wired.
>
> May be it till related to the clean logic of flink native mode, from my
> side, operator side is hard to deal with such situation because we cannot
> directly get the exit code of container when pod is missing and jm
> deployment is missing.
>
> Thanks to your time to read this issue.
> Richard Su
> >
> > 2023年12月6日 13:34，richard.su <richardsuc...@gmail.com> 写道：
> >
> > For more information to produce this problem,
> >
> > version: flink operator 1.4
> > mode: native
> > job: wordcount
> > language: java
> > type: FlinkDeployment
> >
> >> 2023年12月6日 10:52，richard.su <richardsuc...@gmail.com> 写道：
> >>
> >> Hi Community, the default configuration of flink operator is:
> >>
> >> kubernetes.operator.reconcile.interval: 15s
> >> kubernetes.operator.observer.progress-check.interval: 5s
> >>
> >> when a bounded streaming job already stays in stop or error status, jm
> deployment will stay to be missing, if I set configuration:
> >>
> >> kubernetes.operator.jm-deployment-recover.enabled: false
> >>
> >> then, flink operator can only observe the job status at Recociling and
> jm deployment status at Missing
> >>
> >> we cannot check whether the flink job is  finished or error, because of
> in the interval of observer.progress-check, flink web ui is already down.
> >>
> >> so, we hope someone in community could show a way to monitor bounded
> steaming job's status.
> >>
> >> Thanks.
> >>
> >> Richard Su
> >
>
>

Re: [DISCUSSION] Consider Flink operator having a way to monitor the status of bounded streaming jobs after they finish or error?

Reply via email to