[jira] [Comment Edited] (FLINK-17328) Expose network metric for job vertex in rest api

Piotr Nowojski (Jira) Mon, 14 Sep 2020 06:43:19 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195466#comment-17195466
 ]


Piotr Nowojski edited comment on FLINK-17328 at 9/14/20, 1:42 PM:
------------------------------------------------------------------

What I meant is difficult, is that if you have ~100 of tasks (with hundreds of 
parallel subtasks each), it's really difficult to understand what's happening 
with the Job, without visualising the data in a shape of the job graph. With 
textual form, you are forced to look the tasks (or subtasks for data skew) one 
by one. Grafana or other metrics visualisers are not helping with that much.

Now compare this to looking at a graph with green, yellow or red dots and with 
some other similar marker for average state of the buffer pools. One quick 
glance and it becomes immediately obvious:
* what is backpressured and what's not
* if there is some data skew involved and on which edges

More over, just for the sake of sanity of people using Flink or answering to 
users's problems, it's really good to have some basic functionality built into 
the system, that allows to understand what's happening.


was (Author: pnowojski):
What I meant is difficult, is that if you have ~100 of tasks (with hundreds of 
parallel subtasks each), it's really difficult to understand what's happening 
with the Job, without visualising the data in a shape of the job graph. Have 
you tried doing it [~chesnay]? :) With textual form, you are forced to look the 
tasks (or subtasks for data skew) one by one. Grafana or other metrics 
visualisers are not helping with that much.

Now compare this to looking at a graph with green, yellow or red dots and with 
some other similar marker for average state of the buffer pools. One quick 
glance and it becomes immediately obvious:
* what is backpressured and what's not
* if there is some data skew involved and on which edges

More over, just for the sake of sanity of people using Flink or answering to 
users's problems, it's really good to have some basic functionality built into 
the system, that allows to understand what's happening.

> Expose network metric for job vertex in rest api
> ------------------------------------------------
>
>                 Key: FLINK-17328
>                 URL: https://issues.apache.org/jira/browse/FLINK-17328
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Metrics, Runtime / REST
>            Reporter: lining
>            Assignee: lining
>            Priority: Major
>              Labels: pull-request-available
>
> JobDetailsHandler
>  * pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, 
> inputFloatingBuffersUsageAvg
>  * back-pressured for show whether it is back pressured(merge all iths 
> subtasks)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17328) Expose network metric for job vertex in rest api

Reply via email to