Prabhu Joseph created FLINK-32173: ------------------------------------- Summary: Flink Job Metrics returns stale values in the first request after an update in the values Key: FLINK-32173 URL: https://issues.apache.org/jira/browse/FLINK-32173 Project: Flink Issue Type: Bug Components: Runtime / Metrics Affects Versions: 1.17.0 Reporter: Prabhu Joseph
Flink Job Metrics returns stale values in the first request after an update in the values. *Repro:* 1. Run a flink job with fixed strategy and with multiple attempts {code} restart-strategy: fixed-delay restart-strategy.fixed-delay.attempts: 10000 flink run -Dexecution.checkpointing.interval="10s" -d -c org.apache.flink.streaming.examples.wordcount.WordCount /usr/lib/flink/examples/streaming/WordCount.jar {code} 2. Kill one of the TaskManager which will initiate job restart. 3. After job restarted, fetch any job metrics. The first time it returns stale (older) value 48. {code} [hadoop@ip-172-31-44-70 ~]$ curl http://jobmanager:52000/jobs/d24f7d74d541f1215a65395e0ebd898c/metrics?get=numRestarts | jq . [ { "id": "numRestarts", "value": "48" } ] {code} 4. On subsequent runs, it returns the correct value. {code} [hadoop@ip-172-31-44-70 ~]$ curl http://jobmanager:52000/jobs/d24f7d74d541f1215a65395e0ebd898c/metrics?get=numRestarts | jq . [ { "id": "numRestarts", "value": "49" } ] {code} 5. Repeat steps 2 to 5, which will show that the first request after an update to the metrics returns a previous value before the update. Only on the next request is the correct value returned. -- This message was sent by Atlassian Jira (v8.20.10#820010)