[ 
https://issues.apache.org/jira/browse/FLINK-14118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937579#comment-16937579
 ] 

Piotr Nowojski commented on FLINK-14118:
----------------------------------------

I guess your assessment might be right, that JVM is "merging"/combining both 
memory read fences together in this case. I'm not happy about adding another 
synchronisation point, as this complicates the threading model and might brake 
performance if we refactor the code, but the fix is indeed much simpler 
compared to the alternatives. +1 for this solution (over moving output flusher 
to mailbox/netty), as long as there are no visible performance regressions. 

[~kevin.cyj] let's move the discussion to the PR (please post the benchmark 
results there) and could you add to the PR a new benchmark to cover for this 
scenario?

I wouldn't be back porting this to older releases, but just fix this in 1.10. 
Old releases branches do not have the same test coverage as the master branch 
and a lot of bugs in the network stack are being discovered by repeatedly 
running the same ITCases over and over again. Also we do not run any benchmarks 
on release branches. If we merge this to 1.8 or 1.9 branches, we might miss 
something (either because they will be released too quickly or because there 
will be some different interplay with other components).

> Reduce the unnecessary flushing when there is no data available for flush
> -------------------------------------------------------------------------
>
>                 Key: FLINK-14118
>                 URL: https://issues.apache.org/jira/browse/FLINK-14118
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.10.0, 1.9.1, 1.8.3
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The new flush implementation which works by triggering a netty user event may 
> cause performance regression compared to the old synchronization-based one. 
> More specifically, when there is exactly one BufferConsumer in the buffer 
> queue of subpartition and no new data will be added for a while in the future 
> (may because of just no input or the logic of the operator is to collect some 
> data for processing and will not emit records immediately), that is, there is 
> no data to send, the OutputFlusher will continuously notify data available 
> and wake up the netty thread, though no data will be returned by the 
> pollBuffer method.
> For some of our production jobs, this will incur 20% to 40% CPU overhead 
> compared to the old implementation. We tried to fix the problem by checking 
> if there is new data available when flushing, if there is no new data, the 
> netty thread will not be notified. It works for our jobs and the cpu usage 
> falls to previous level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to