wangxiaojing opened a new pull request, #27876:
URL: https://github.com/apache/flink/pull/27876

   What is the purpose of the change      
                                                                                
                                                                                
                                                                                
                                             
     This pull request adds the TaskManager IP address to the per-subtask 
checkpoint statistics displayed in the Flink Web UI and exposed via the REST 
API.                                                                            
                                                      
   
     When diagnosing slow or failing checkpoints, operators need to identify 
which TaskManager host a particular subtask was running on. Without this 
information, correlating checkpoint latency with specific nodes (e.g., machines 
with disk I/O bottlenecks, network issues, or GC
     pressure) requires cross-referencing subtask indices with the TaskManager 
assignment through a separate UI path. This change surfaces the IP address 
directly in the subtask checkpoint statistics table, reducing time-to-diagnose 
for checkpoint-related incidents.
   
     Brief change log
   
     - SubtaskStateStats: Added ip field (nullable String) with getter to carry 
the TaskManager IP address per subtask. Updated serialVersionUID to reflect the 
serialization format change.
     - PendingCheckpoint: When a subtask acknowledges a checkpoint, extract the 
TaskManager's IP via TaskManagerLocation.getAddress().getHostAddress() and 
store it in the resulting SubtaskStateStats.
     - DefaultCheckpointStatsTracker: Pass null for ip on the internal 
stats-only code path where no execution context is available.
     - SubtaskCheckpointStatistics.CompletedSubtaskCheckpointStatistics: Added 
optional ip JSON field to the REST API response. The field is backwards 
compatible — clients that do not recognise it will ignore it.
     - TaskCheckpointStatisticDetailsHandler: Propagate 
SubtaskStateStats.getIp() into the REST response object.
     - Web UI (job-checkpoints-subtask component): Added a sortable "IP 
Address" column to the subtask checkpoint statistics table. The sort function 
correctly normalises IPv4 segments for lexicographic ordering. Null/missing 
values are displayed as -.
   
     Verifying this change
   
     This change added tests and can be verified as follows:
   
     - SubtaskStateStatsTest: Extended existing getter tests to assert that 
getIp() returns the value passed at construction time, and that the value 
survives Java serialization round-trips.
     - PendingCheckpointTest#testAcknowledgeTaskCapturesTaskManagerIp: New 
test. Creates a mock ExecutionVertex backed by a real TaskManagerLocation (), 
calls PendingCheckpoint.acknowledgeTask(), and asserts that
     PendingCheckpointStats.getLatestAcknowledgedSubtaskStats().getIp() equals 
"". This directly validates the extraction path through 
location.getAddress().getHostAddress().
     - TaskCheckpointStatisticsWithSubtaskDetailsTest: Updated JSON 
marshalling/unmarshalling round-trip test to include a realistic IP value , 
validating that the ip field is correctly serialized and deserialized via 
Jackson.
     - DefaultCheckpointStatsTrackerTest, PendingCheckpointStatsTest, 
TaskStateStatsTest, CompletedCheckpointTest: Updated all SubtaskStateStats 
construction call sites to pass the new ip parameter with a consistent null 
sentinel.
   
     Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
@Public(Evolving): no
     - The serializers: yes — SubtaskStateStats is Serializable; a new ip field 
was added . Old serialized forms of SubtaskStateStats (checkpoint statistics 
history) are not compatible with the new class.
     This affects checkpoint stats display after a rolling upgrade but does not 
affect checkpoint data correctness.
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes — the change 
touches PendingCheckpoint.acknowledgeTask(), which is on the checkpoint 
acknowledgement path in the JobManager. The change is read-only
     with respect to checkpoint correctness (it only reads TaskManagerLocation 
metadata already available on the ExecutionVertex) and adds no synchronisation 
overhead.
     - The S3 file system connector: no
   
     Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? not documented (the IP address 
column is self-explanatory in the Web UI; no changes to the public REST API 
contract beyond adding an optional backwards-compatible field)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to