[
https://issues.apache.org/jira/browse/FLINK-39617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078728#comment-18078728
]
Herbert Wang edited comment on FLINK-39617 at 5/6/26 4:18 PM:
--------------------------------------------------------------
I am opening this ticket to discuss the API shape before submitting any
implementation, the supported change is purely additive.
was (Author: JIRAUSER309329):
I am opening this ticket to discuss the API shape before submitting any
implementation.
> Add batch REST endpoints for aggregated subtask metrics across multiple job
> vertices
> --------------------------------------------------------------------------------------
>
> Key: FLINK-39617
> URL: https://issues.apache.org/jira/browse/FLINK-39617
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics, Runtime / REST
> Reporter: Herbert Wang
> Priority: Major
> Labels: Metrics, metrics, rest_api
>
> The JobManager REST API currently exposes aggregated subtask metrics per job
> vertex via:
> {code}
> GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
> {code}
> Clients that need the same metric set for many vertices, such as autoscalers
> or monitoring integrations, must issue one request per vertex for metric-name
> discovery and another request per vertex for metric values. For jobs with
> many vertices this creates avoidable REST fan-out, repeated MetricFetcher
> updates, and large repeated payloads.
> h2. Proposal
> Add two batch JobManager REST endpoints for aggregated subtask metrics across
> multiple vertices:
> {code}
> POST /jobs/:jobid/vertices/subtasks/metrics/names
> POST /jobs/:jobid/vertices/subtasks/metrics/values
> {code}
> The existing single-vertex endpoint should remain unchanged for compatibility.
> The endpoints are intentionally split rather than using one POST endpoint
> with mode-switching behavior, so OpenAPI schemas, code generation, and
> capability detection remain straightforward.
> h3. Name discovery endpoint
> Request:
> {code:json}
> {
> "vertexIds": ["<jobVertexId>", "<jobVertexId>"],
> "regex": [".*busyTime.*", ".*numRecords.*"]
> }
> {code}
> Response:
> {code:json}
> [
> {
> "vertexId": "<jobVertexId>",
> "metrics": [{ "id": "busyTimeMsPerSecond" }]
> }
> ]
> {code}
> h3. Value aggregation endpoint
> Request:
> {code:json}
> {
> "vertices": [
> { "vertexId": "<jobVertexId>", "metrics": ["busyTimeMsPerSecond"] },
> { "vertexId": "<jobVertexId>", "metrics": ["numRecordsInPerSecond"] }
> ],
> "agg": ["min", "max", "avg"]
> }
> {code}
> Response:
> {code:json}
> [
> {
> "vertexId": "<jobVertexId>",
> "metrics": [{ "id": "busyTimeMsPerSecond", "min": 0.0, "max": 1.0, "avg":
> 0.5 }]
> }
> ]
> {code}
> h2. Compatibility
> This is additive. The existing endpoint remains unchanged:
> {code}
> GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
> {code}
> Clients can feature-detect the new endpoints and fall back to the existing
> per-vertex endpoint when unavailable, or we can cherry-pick to earlier 2.x
> versions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)