[jira] [Commented] (FLINK-8625) Move OutputFlusher thread to Netty scheduled executor

2021-04-16 Thread Flink Jira Bot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17323581#comment-17323581
 ] 

Flink Jira Bot commented on FLINK-8625:
---

This issue is assigned but has not received an update in 7 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Move OutputFlusher thread to Netty scheduled executor
> -
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Major
>  Labels: stale-assigned
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 
> PR: https://github.com/apache/flink/pull/6698



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-8625) Move OutputFlusher thread to Netty scheduled executor

2018-02-16 Thread Piotr Nowojski (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366763#comment-16366763
 ] 

Piotr Nowojski commented on FLINK-8625:
---

Yes. Instead of this OutputFlasher thread we want to move this job to 
{{PartitionRequestQueue}} inside Netty's thread pool. 
\{{PartitionRequestQueue}} could just constantly "busy loop" over all of the 
subpartitions and if it encounters an "empty" subpartition, it could back 
of/try again later with some timeout. This will have following advantages:
 # No GC issues with constant and frequent scheduling data notification caused 
by current OutputFlusher thread. If we have 1000 output channels and 1ms flush 
timeout this adds up to 1 000 000 notifications per second.
 # No context switching and no synchronisation for the flushing
 # Instead of notifying 1000 output channels one by one as it's happening right 
now, one thread running one instance of \{{PartitionRequestQueue}} will handle 
all of the assigned to him output channels - with 10 Netty threads, each thread 
will handle ~100  output channels in one flush.
 # In case of full throughput scenario when all partitions are constantly full, 
we can skip the flush.

> Move OutputFlusher thread to Netty scheduled executor
> -
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Piotr Nowojski
>Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8625) Move OutputFlusher thread to Netty scheduled executor

2018-02-15 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366531#comment-16366531
 ] 

mingleizhang commented on FLINK-8625:
-

Hi, [~pnowojski] One thing I would like to confirm. We want to replace the 
original way which in every instance of {{StreamRecordWriter}} there will be 
create a thread to do the flush work by scheduling task with {{EventLoop}} ?

> Move OutputFlusher thread to Netty scheduled executor
> -
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Piotr Nowojski
>Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8625) Move OutputFlusher thread to Netty scheduled executor

2018-02-12 Thread Piotr Nowojski (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360611#comment-16360611
 ] 

Piotr Nowojski commented on FLINK-8625:
---

I have found one more thing. After fixing the current performance bottlenecks 
in https://issues.apache.org/jira/browse/FLINK-8581 , currently GC pressure 
caused by OutputFlasher is our biggest performance bottleneck/issue. 
OutputFlasher executed once per 1ms for 1000 output channels enqueue every 1ms 
1000 elements on a internal Netty's executor. I presume those objects are 
pilling up and ending up in old GC generation.

This GC pressure is causing huge throughput fluctuations (because of long GC 
pauses) between 20,000 records/ms down to 160 records/ms. Those long GC pauses 
are quite dangerous, since they can cause Jobs failure.

> Move OutputFlusher thread to Netty scheduled executor
> -
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Piotr Nowojski
>Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8625) Move OutputFlusher thread to Netty scheduled executor

2018-02-12 Thread mingleizhang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360533#comment-16360533
 ] 

mingleizhang commented on FLINK-8625:
-

Thanks [~pnowojski] We can improve the performance and reduce the resources 
usage with executor, 

> Move OutputFlusher thread to Netty scheduled executor
> -
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Piotr Nowojski
>Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)