Re: Flink Prometheus metric doubt
In practice the documentation is incorrect. While technically the metric _would_ emit -1 if the job is in a failed/finished state, the reality is that at this point the metric is unregistered and no longer updated, since the owning component (the jobmanager) is shutting down. I can't think of a workaround for this problem at the moment. On 19/12/2019 11:56, Jesús Vásquez wrote: Hi all, i'm monitoring Flink jobs using prometheus. I have been trying to use the metrics flink_jobmanager_job_uptime/downtime in order to create an alert, that fires when one of this values emits -1 since the doc says this is the behavior of the metric when the job gets to a completed state. The thing is that i have tested the behavior when one of my job fails and the mentioned metrics never emit something different than zero. Finally the metric disappears after the job has failed. Am i missing something or is this the expected behavior ?
Re: Flink Prometheus metric doubt
Hi Jesus, IMHO, maybe @Chesnay Schepler can provide more information. Best, Vino Jesús Vásquez 于2019年12月19日周四 下午6:57写道: > Hi all, i'm monitoring Flink jobs using prometheus. > I have been trying to use the metrics flink_jobmanager_job_uptime/downtime > in order to create an alert, that fires when one of this values emits -1 > since the doc says this is the behavior of the metric when the job gets to > a completed state. > The thing is that i have tested the behavior when one of my job fails and > the mentioned metrics never emit something different than zero. Finally the > metric disappears after the job has failed. > Am i missing something or is this the expected behavior ? >
Flink Prometheus metric doubt
Hi all, i'm monitoring Flink jobs using prometheus. I have been trying to use the metrics flink_jobmanager_job_uptime/downtime in order to create an alert, that fires when one of this values emits -1 since the doc says this is the behavior of the metric when the job gets to a completed state. The thing is that i have tested the behavior when one of my job fails and the mentioned metrics never emit something different than zero. Finally the metric disappears after the job has failed. Am i missing something or is this the expected behavior ?