[ 
https://issues.apache.org/jira/browse/FLINK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671728#comment-16671728
 ] 

ASF GitHub Bot commented on FLINK-10727:
----------------------------------------

tillrohrmann commented on issue #6974: [FLINK-10727][network] remove 
unnecessary synchronization in SingleInputGate#requestPartitions()
URL: https://github.com/apache/flink/pull/6974#issuecomment-435073195
 
 
   This change breaks Flink's iteration mechanism and potentially even more 
because `SingleInputGate#requestedPartitionsFlag` is read in 
`SingleInputGate#updateInputChannel` which is not called from the `Task` 
thread. The result can be that `updateInputChannel` does not request the sub 
partition of newly registered partitions. Due to that it can happen that a job 
gets stuck because it never consumes the input from a producer.
   
   You can easily reproduce the problem by adding a `Thread.sleep(10L)` before 
setting `requestedPartitionsFlag = true`.
   
   I'm wondering how much improvement these kind of changes actually bring. I'm 
a bit concerned that changes to such a critical component like the network 
stack get merged into master just before feature freeze. If at all, something 
like this should be merged at the beginning of the release cycle to give it 
more exposure. Moreover, Travis never passed and actually failed with an IT 
case running in exactly this problem. And also IntelliJ warns about the fact 
that `requestedPartitionsFlag` is accessed both in synchronized and 
unsynchronized context which should be red flag in most cases. I think we 
should be more careful in the future!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove unnecessary synchronization in SingleInputGate#requestPartitions()
> -------------------------------------------------------------------------
>
>                 Key: FLINK-10727
>                 URL: https://issues.apache.org/jira/browse/FLINK-10727
>             Project: Flink
>          Issue Type: Improvement
>          Components: Network
>    Affects Versions: 1.5.5, 1.6.2, 1.7.0
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
>
>
> For every {{SingleInputGate#getNextBufferOrEvent()}}, 
> {{SingleInputGate#requestPartitions()}} is called and this always 
> synchronizes on the {{requestLock}} before checking the 
> {{requestedPartitionsFlag}}. Since {{SingleInputGate#requestPartitions()}} is 
> only called from the same thread (the task thread getting the record), it is 
> enough to check the {{requestedPartitionsFlag}} first before synchronizing 
> for the actual requests (if needed). {{UnionInputGate}} already goes the same 
> way in its {{requestPartitions()}} implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to