[ 
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764038#comment-17764038
 ] 

Hangxiang Yu commented on FLINK-23411:
--------------------------------------

Hi, [~pnowojski] , Thanks for picking this up.
I think it's indeed a problem that all task level metrics have, and 
checkpoint-related metrics makes it more obvious which is related to checkpoint 
duration.

[distributed 
tracing|https://newrelic.com/blog/how-to-relic/distributed-tracing-anomaly-detection]
 and OTEL sound an intersting idea, maybe we could still register some task 
level metrics like this which could be unregistered, and it could work with 
OTEL.

It's fine for me to resolve FLINK-33071 firstly.

> Expose Flink checkpoint details metrics
> ---------------------------------------
>
>                 Key: FLINK-23411
>                 URL: https://issues.apache.org/jira/browse/FLINK-23411
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: Jun Qin
>            Assignee: Hangxiang Yu
>            Priority: Major
>              Labels: pull-request-available, stale-assigned
>             Fix For: 1.18.0
>
>
> The checkpoint metrics as shown in the Flink Web UI like the 
> sync/async/alignment/start delay are not exposed to the metrics system. This 
> makes problem investigation harder when Web UI is not enabled: those numbers 
> can not get in the DEBUG logs. I think we should see how we can expose 
> metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to