Prabhu Joseph created FLINK-32173:
-------------------------------------
Summary: Flink Job Metrics returns stale values in the first
request after an update in the values
Key: FLINK-32173
URL: https://issues.apache.org/jira/browse/FLINK-32173
Project: Flink
Issue Type: Bug
Components: Runtime / Metrics
Affects Versions: 1.17.0
Reporter: Prabhu Joseph
Flink Job Metrics returns stale values in the first request after an update in
the values.
*Repro:*
1. Run a flink job with fixed strategy and with multiple attempts
{code}
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 10000
flink run -Dexecution.checkpointing.interval="10s" -d -c
org.apache.flink.streaming.examples.wordcount.WordCount
/usr/lib/flink/examples/streaming/WordCount.jar
{code}
2. Kill one of the TaskManager which will initiate job restart.
3. After job restarted, fetch any job metrics. The first time it returns stale
(older) value 48.
{code}
[hadoop@ip-172-31-44-70 ~]$ curl
http://jobmanager:52000/jobs/d24f7d74d541f1215a65395e0ebd898c/metrics?get=numRestarts
| jq .
[
{
"id": "numRestarts",
"value": "48"
}
]
{code}
4. On subsequent runs, it returns the correct value.
{code}
[hadoop@ip-172-31-44-70 ~]$ curl
http://jobmanager:52000/jobs/d24f7d74d541f1215a65395e0ebd898c/metrics?get=numRestarts
| jq .
[
{
"id": "numRestarts",
"value": "49"
}
]
{code}
5. Repeat steps 2 to 5, which will show that the first request after an update
to the metrics returns a previous value before the update. Only on the next
request is the correct value returned.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)