Yu Yang created FLINK-20829: ------------------------------- Summary: flink.jm.downtime metric is inaccurate in flink 1.9.1 and 1.11.1 Key: FLINK-20829 URL: https://issues.apache.org/jira/browse/FLINK-20829 Project: Flink Issue Type: Bug Components: API / Scala, Runtime / Metrics Affects Versions: 1.11.1, 1.9.1 Reporter: Yu Yang Attachments: Screen Shot 2021-01-01 at 2.38.39 PM.png
According to the comments in [DownTimeGauge.java|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/metrics/DownTimeGauge.java#L28]: A gauge that returns (in milliseconds) how long a job has not been not running any more, in case it is in a failing/recovering situation. Running jobs return naturally a value of zero. We noticed that flink runtime reports inaccurate value for flink.jm.downtime metric. What flink reports was actually the uptime in milliseconds before the application restarted. !Screen Shot 2021-01-01 at 2.38.39 PM.png|width=720! -- This message was sent by Atlassian Jira (v8.3.4#803005)