[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2020-12-31 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256927#comment-17256927
 ] 

Piotr Nowojski commented on FLINK-14814:


Previous approach using {{isBackPressuredRatio}} and 
{{isCausingBackPressureRatio}} had a major problem with accuracy of 
measurements, if load spikes were happening quicker/faster then the sampling 
rate (it's impossible to accurately sample a wave, with sampling rate smaller 
then half of the wave's frequency).

Because of that I switched to another approach: using 
{{backPressuredTimeMsPerSecond}} and {{busyTimeMsPerSecond}} which we can 
calculate much more accurately.

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: Piotr Nowojski
>Priority: Major
>  Labels: pull-request-available
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png, Screenshot 
> 2020-12-30 at 14.09.19.png, Screenshot 2020-12-31 at 10.27.52.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2020-12-31 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256926#comment-17256926
 ] 

Piotr Nowojski commented on FLINK-14814:


New visualisation based on the updated approach:
 !Screenshot 2020-12-31 at 10.27.52.png! 

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: Piotr Nowojski
>Priority: Major
>  Labels: pull-request-available
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png, Screenshot 
> 2020-12-30 at 14.09.19.png, Screenshot 2020-12-31 at 10.27.52.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2020-12-30 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256515#comment-17256515
 ] 

Piotr Nowojski commented on FLINK-14814:


My current (implemented in the PR) proposal how to display backpressure status 
looks like this:
 !Screenshot 2020-12-30 at 14.09.19.png! 

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: Piotr Nowojski
>Priority: Major
>  Labels: pull-request-available
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png, Screenshot 
> 2020-12-30 at 14.09.19.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2020-11-20 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236101#comment-17236101
 ] 

Matthias commented on FLINK-14814:
--

FYI: I commented on the state of this issue in its parent FLINK-14712.

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-19 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977229#comment-16977229
 ] 

Piotr Nowojski commented on FLINK-14814:


Ok, sounds good [~lining]. One thing:
> pool usage aggregated by max, min, and the average in every vertex for users 
> to judge vertex

Do we need all of the aggregates? Max & min for example? Check my explanation 
in FLINK-14815 for why I think min/max aggregate for pool usage might be 
redundant to just average. On the other hand presenting too many metrics has 
couple of potential issues:
# information spam to a user (why show him something that he doesn't need?)
# potential performance implications? Even if not now, but in the future, if we 
add too many metrics now, it will be difficult to drop them in the future.  


> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-18 Thread Yadong Xie (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976469#comment-16976469
 ] 

Yadong Xie commented on FLINK-14814:


Sure [~lining], this proposal looks great, I will review this PR when you 
finish it.

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-18 Thread lining (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976459#comment-16976459
 ] 

lining commented on FLINK-14814:


Maybe we could create another Jira for show these metrics on vertex, this one 
for REST API to expose these metrics. 

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-18 Thread lining (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976456#comment-16976456
 ] 

lining commented on FLINK-14814:


We want to both. 
1. present non aggregated metrics for subtask, it's to find which subtask is 
blocked. 
2. pool usage aggregated by max, min, and the average in every vertex for users 
to judge vertex. 
3. show FLINK-14813 back-pressured metric on vertex.

Maybe [~vthinkxie]  could help us to review it.

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-18 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976364#comment-16976364
 ] 

Piotr Nowojski commented on FLINK-14814:


Having multiple output edges is I think not that often, and even if, one can 
deduce the state from the combined output usage basing on the fact that buffers 
are rarely in other states than "mostly empty" and "mostly full". Value of 
{{outputUsage}} jiggling around 50% means one output is full other is empty. 
Because of that I wouldn't worry about it too much, at least not in the first 
version.

I think the bigger problem is that your screenshot displays the tasks, not 
individual subtasks/parallel instances. This rises a question:
# do we want to present non aggregated metrics for subtask?
# do we want to present aggregated metrics for the tasks? ...
# ... if so, how to aggregate the metrics (and who should be doing that)?

1. would be easier to do, significantly more detailed and fine grained, however 
less user friendly and more difficult to use.
2. loosing some information in an exchange for a simpler usage

(we might want to do both, or one first, later the other)

3. we would have to decide how to aggregate individual value. For example if 
one single subtask is back-pressured, do we report that whole task is 
back-pressured? For pool usage should we average them out? Max? Regarding who 
should be doing that - it shouldn't be the UI, so in that case we would need 
one more metric related ticket to actually come up with an idea how to 
aggregate the metrics.

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
> Attachments: 2B0E910D-6D95-401F-B450-1F6B1AFB9BEA.png
>
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-17 Thread lining (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976238#comment-16976238
 ] 

lining commented on FLINK-14814:


Now we may just show this information in the Web Frontend.

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14814) Show the vertex that produces the backpressure source in the job

2019-11-15 Thread Piotr Nowojski (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974967#comment-16974967
 ] 

Piotr Nowojski commented on FLINK-14814:


One clarifying question [~lining]. What's the scope of this ticket? Would you 
like this information to be visible in the Web Frontend, or just exposed via 
some metric?

> Show the vertex that produces the backpressure source in the job
> 
>
> Key: FLINK-14814
> URL: https://issues.apache.org/jira/browse/FLINK-14814
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Metrics, Runtime / Network, Runtime / REST, 
> Runtime / Web Frontend
>Reporter: lining
>Assignee: lining
>Priority: Major
>
> By checking the status of output and input buffer pools exposed via 
> FLINK-14815 (output buffer empty, input buffer full) it is possible to 
> display which node is a source of the back pressure. This information could 
> be displayed/accessible in the Web Frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)