[
https://issues.apache.org/jira/browse/FLINK-39376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-39376:
-----------------------------------
Labels: pull-request-available (was: )
> Show TaskManager IP address in Checkpoint Subtask Statistics
> ------------------------------------------------------------
>
> Key: FLINK-39376
> URL: https://issues.apache.org/jira/browse/FLINK-39376
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 2.1.1
> Environment: prod
> Reporter: jingzi
> Priority: Minor
> Labels: pull-request-available
>
> h1. Summary
> When diagnosing slow or failing checkpoints, operators need to identify
> which TaskManager hosts are responsible for high checkpoint latency or
> failures. Currently, the checkpoint subtask statistics table in the Flink Web
> UI (/jobs/<job-id>/checkpoints/subtask/<vertex-id>) shows
> per-subtask metrics (state size, duration, alignment, etc.) but does not
> include information about which TaskManager (host/IP) each subtask ran on.
> h1. Motivation
> - Disk I/O bottlenecks, network issues, or GC pressure on specific nodes
> are common root causes of slow checkpoints. Without host information,
> operators must cross-reference subtask indices with TaskManager assignment
> through a separate UI path.
> - Providing the IP/hostname directly in the subtask checkpoint statistics
> table reduces MTTR for checkpoint-related incidents.
> h1. Proposed Solution
> 1. Add an ip field to SubtaskStateStats populated from TaskManagerLocation
> at checkpoint acknowledgement time in PendingCheckpoint.
> 2. Expose the field in the REST API response via
> SubtaskCheckpointStatistics.CompletedSubtaskCheckpointStatistics.
> 3. Display a sortable "IP Address" column in the Web UI subtask checkpoint
> statistics table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)