Re: [DISCUSSION] Consider Flink operator having a way to monitor the status of bounded streaming jobs after they finish or error?

richard.su Thu, 07 Dec 2023 00:31:27 -0800

Hi Gyula, Flink version is 1.14
Our flink version is hard to upgrade since we have some user in our platform.
sorry I have not noticed this configuration, it's confusing because flink 
operator announced support from 1.13 to 1.17/1.18


Has other solution will work in our situation?

Thanks 
Richard Su

> 2023年12月7日 16:22，Gyula Fóra <gyula.f...@gmail.com> 写道：
> 
> Hi!
> 
> What Flink version are you using?
> The operator always sets: execution.shutdown-on-application-finish to false
> so that finished / failed application clusters should not exit immediately
> and we can observe them.
> 
> This is however only available in Flink 1.15 and above.
> 
> Cheers,
> Gyula
> 
> On Thu, Dec 7, 2023 at 9:15 AM richard.su <richardsuc...@gmail.com> wrote:
> 
>> Hi, Community, I had found out this issue, but I'm not sure this issue
>> have any solution. I have tried flink operator 1.6, which this issue is
>> still exist.
>> 
>> If not, I think this could create a jira issue to following.
>> 
>> When we create a bounded streaming jobs which will finally to become
>> Finished status, after this job's status from Running to Finished, flink
>> will shut down kubernetes cluster, at code of flink-kubernetes package,
>> class KubernetesResourceManagerDriver's method deregisterApplication, which
>> will delete jm deployment directly in a second (in our env).
>> But our operator config, when jm deployment status is Ready and not in
>> savepoint progress, this observer interval is 15s, which means operator
>> will never observe the job status changing.
>> So if the job is failed not finished, we cannot distinguish this. All we
>> known is Jm deployment is Missing and Job status is Reconciling.
>> We want to using flink operator integrating into our platform, but it
>> cannot monitor job real status, which is wired.
>> 
>> May be it till related to the clean logic of flink native mode, from my
>> side, operator side is hard to deal with such situation because we cannot
>> directly get the exit code of container when pod is missing and jm
>> deployment is missing.
>> 
>> Thanks to your time to read this issue.
>> Richard Su
>>> 
>>> 2023年12月6日 13:34，richard.su <richardsuc...@gmail.com> 写道：
>>> 
>>> For more information to produce this problem,
>>> 
>>> version: flink operator 1.4
>>> mode: native
>>> job: wordcount
>>> language: java
>>> type: FlinkDeployment
>>> 
>>>> 2023年12月6日 10:52，richard.su <richardsuc...@gmail.com> 写道：
>>>> 
>>>> Hi Community, the default configuration of flink operator is:
>>>> 
>>>> kubernetes.operator.reconcile.interval: 15s
>>>> kubernetes.operator.observer.progress-check.interval: 5s
>>>> 
>>>> when a bounded streaming job already stays in stop or error status, jm
>> deployment will stay to be missing, if I set configuration:
>>>> 
>>>> kubernetes.operator.jm-deployment-recover.enabled: false
>>>> 
>>>> then, flink operator can only observe the job status at Recociling and
>> jm deployment status at Missing
>>>> 
>>>> we cannot check whether the flink job is  finished or error, because of
>> in the interval of observer.progress-check, flink web ui is already down.
>>>> 
>>>> so, we hope someone in community could show a way to monitor bounded
>> steaming job's status.
>>>> 
>>>> Thanks.
>>>> 
>>>> Richard Su
>>> 
>> 
>>

Re: [DISCUSSION] Consider Flink operator having a way to monitor the status of bounded streaming jobs after they finish or error?

Reply via email to