[
https://issues.apache.org/jira/browse/FLINK-34213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810006#comment-17810006
]
Maximilian Michels commented on FLINK-34213:
If we had to query metrics per vertex, that would be too expensive, but it
seems like that is not necessary. Here is an exemplary REST API response to the
{{/jobs/}} endpoint:
{noformat}
{
"jid": "b4f918c2a0312de9fe7369a7db093e96",
"name": "-",
"isStoppable": false,
"state": "RUNNING",
"start-time": 1705094021727,
"end-time": -1,
"duration": 928985186,
"maxParallelism": 1,
"now": 1706023006913,
"timestamps": {
"SUSPENDED": 0,
"RUNNING": 1705094036134,
"FAILING": 0,
"CANCELED": 0,
"CANCELLING": 0,
"CREATED": 1705094035034,
"INITIALIZING": 1705094021727,
"FAILED": 0,
"RESTARTING": 0,
"RECONCILING": 0,
"FINISHED": 0
},
"vertices": [
{
"id": "db1f263dc155338dc2a9622a2e06d115",
"name": "",
"maxParallelism": 1,
"parallelism": 18,
"status": "RUNNING",
"start-time": 1705094037437,
"end-time": -1,
"duration": 928969476,
"tasks": {
"CANCELED": 0,
"DEPLOYING": 0,
"CANCELING": 0,
"RECONCILING": 0,
"FINISHED": 0,
"SCHEDULED": 0,
"CREATED": 0,
"INITIALIZING": 0,
"FAILED": 0,
"RUNNING": 18
},
"metrics": {
"read-bytes": 0,
"read-bytes-complete": true,
"write-bytes": 2907138853415272,
"write-bytes-complete": true,
"read-records": 0,
"read-records-complete": true,
"write-records": 229589536334,
"write-records-complete": true,
"accumulated-backpressured-time": 1533744940,
"accumulated-idle-time": 10026044858,
"accumulated-busy-time": 5161601268
}
},
...
]
}
{noformat}
Note the accumulated backpressure/idle time.
> Consider using accumulated busy time instead of busyMsPerSecond
> ---
>
> Key: FLINK-34213
> URL: https://issues.apache.org/jira/browse/FLINK-34213
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler, Kubernetes Operator
>Reporter: Maximilian Michels
>Priority: Minor
>
> We might achieve much better accuracy if we used the accumulated busy time
> metrics from Flink, instead of the momentarily collected ones.
> We would use the diff between the last accumulated and the current
> accumulated busy time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)