[ https://issues.apache.org/jira/browse/FLINK-14815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976365#comment-16976365 ]
Piotr Nowojski commented on FLINK-14815: ---------------------------------------- Regarding the aggregation of the metrics. If one single subtask is back-pressured, do we report that whole task is back-pressured? I think that would make sense. For the pool usages, I'm not sure about the "max" value, as we are loosing a lot of the fidelity. If any sub task is back-pressured, both its input and output pool will be full, so the aggregated value will be also "100%". Which is a redundant information with the back-pressured status (drawing the task vertex in red). Maybe average would give us more information? Thanks to that, one could judge how many subtasks are affected by the back-pressure. > Expose network pool usage in IOMetricsInfo > ------------------------------------------ > > Key: FLINK-14815 > URL: https://issues.apache.org/jira/browse/FLINK-14815 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Metrics, Runtime / Network, Runtime / REST > Reporter: lining > Assignee: lining > Priority: Major > > * If sub task is not back pressured, but it is causing a back pressure (full > input, empty output) > * By comparing exclusive/floating buffers usage, whether all channels are > back-pressured or only some of them > {code:java} > public final class IOMetricsInfo { > private final float outPoolUsage; > private final float inputExclusiveBuffersUsage; > private final float inputFloatingBuffersUsage; > } > {code} > JobDetailsInfo.JobVertexDetailsInfo merge use Math.max.(ps: outPoolUsage is > from upstream) -- This message was sent by Atlassian Jira (v8.3.4#803005)