[ https://issues.apache.org/jira/browse/FLINK-10727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671728#comment-16671728 ]
ASF GitHub Bot commented on FLINK-10727: ---------------------------------------- tillrohrmann commented on issue #6974: [FLINK-10727][network] remove unnecessary synchronization in SingleInputGate#requestPartitions() URL: https://github.com/apache/flink/pull/6974#issuecomment-435073195 This change breaks Flink's iteration mechanism and potentially even more because `SingleInputGate#requestedPartitionsFlag` is read in `SingleInputGate#updateInputChannel` which is not called from the `Task` thread. The result can be that `updateInputChannel` does not request the sub partition of newly registered partitions. Due to that it can happen that a job gets stuck because it never consumes the input from a producer. You can easily reproduce the problem by adding a `Thread.sleep(10L)` before setting `requestedPartitionsFlag = true`. I'm wondering how much improvement these kind of changes actually bring. I'm a bit concerned that changes to such a critical component like the network stack get merged into master just before feature freeze. If at all, something like this should be merged at the beginning of the release cycle to give it more exposure. Moreover, Travis never passed and actually failed with an IT case running in exactly this problem. And also IntelliJ warns about the fact that `requestedPartitionsFlag` is accessed both in synchronized and unsynchronized context which should be red flag in most cases. I think we should be more careful in the future! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Remove unnecessary synchronization in SingleInputGate#requestPartitions() > ------------------------------------------------------------------------- > > Key: FLINK-10727 > URL: https://issues.apache.org/jira/browse/FLINK-10727 > Project: Flink > Issue Type: Improvement > Components: Network > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Reporter: Nico Kruber > Assignee: Nico Kruber > Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > For every {{SingleInputGate#getNextBufferOrEvent()}}, > {{SingleInputGate#requestPartitions()}} is called and this always > synchronizes on the {{requestLock}} before checking the > {{requestedPartitionsFlag}}. Since {{SingleInputGate#requestPartitions()}} is > only called from the same thread (the task thread getting the record), it is > enough to check the {{requestedPartitionsFlag}} first before synchronizing > for the actual requests (if needed). {{UnionInputGate}} already goes the same > way in its {{requestPartitions()}} implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)