[ 
https://issues.apache.org/jira/browse/FLINK-39376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-39376:
-----------------------------------
    Labels: pull-request-available  (was: )

> Show TaskManager IP address in Checkpoint Subtask Statistics
> ------------------------------------------------------------
>
>                 Key: FLINK-39376
>                 URL: https://issues.apache.org/jira/browse/FLINK-39376
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 2.1.1
>         Environment: prod
>            Reporter: jingzi
>            Priority: Minor
>              Labels: pull-request-available
>
> h1.   Summary
>     When diagnosing slow or failing checkpoints, operators need to identify 
> which TaskManager hosts are responsible for high checkpoint latency or 
> failures. Currently, the checkpoint subtask statistics table in the Flink Web 
> UI (/jobs/<job-id>/checkpoints/subtask/<vertex-id>) shows
>   per-subtask metrics (state size, duration, alignment, etc.) but does not 
> include information about which TaskManager (host/IP) each subtask ran on.
> h1.   Motivation
>   - Disk I/O bottlenecks, network issues, or GC pressure on specific nodes 
> are common root causes of slow checkpoints. Without host information, 
> operators must cross-reference subtask indices with TaskManager assignment 
> through a separate UI path.
>   - Providing the IP/hostname directly in the subtask checkpoint statistics 
> table reduces MTTR for checkpoint-related incidents.
> h1.   Proposed Solution
>   1. Add an ip field to SubtaskStateStats populated from TaskManagerLocation 
> at checkpoint acknowledgement time in PendingCheckpoint.
>   2. Expose the field in the REST API response via 
> SubtaskCheckpointStatistics.CompletedSubtaskCheckpointStatistics.
>   3. Display a sortable "IP Address" column in the Web UI subtask checkpoint 
> statistics table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to