[
https://issues.apache.org/jira/browse/FLINK-9761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16658543#comment-16658543
]
zhijiang edited comment on FLINK-9761 at 10/22/18 3:56 AM:
-----------------------------------------------------------
I just quickly reviewed the related codes and think this is still a problem
which exists only in non-credit-based mode.
When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}}
is called by canceler thread, and the {{stagedBufferResponse}} exists
currently. But we directly set {{stagedBufferResponse = null}}, so it has no
chance to consume and release this netty message any more resulting in leak
issue.
Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}}
would only consume and release the messages in this {{stageMessages}} list, and
it will not consume and release {{stagedBufferResponse}} firstly. So it still
has logic problem I think.
Maybe need [~NicoK] double check if I guessed the above issue correctly.
was (Author: zjwang):
I just quickly reviewed the related codes and think this is still a problem
which exists only in non-credit-based mode.
When {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}}
is called by canceler thread, and the {{stagedBufferResponse}} is not
currently. But we directly set {{stagedBufferResponse = null}}, so it has no
chance to consume and release this netty message any more resulting in leak
issue.
Even though the {{stageMessages}} is not empty, the {{stagedMessageHandler}}
would only consume and release the messages in this {{stageMessages}} list, and
it will not consume and release {{stagedBufferResponse}} firstly. So it still
has logic problem I think.
Maybe need [~NicoK] double check if I guessed the above issue correctly.
> Potential buffer leak in PartitionRequestClientHandler during job failures
> --------------------------------------------------------------------------
>
> Key: FLINK-9761
> URL: https://issues.apache.org/jira/browse/FLINK-9761
> Project: Flink
> Issue Type: Bug
> Components: Network
> Affects Versions: 1.5.0
> Reporter: Nico Kruber
> Assignee: Nico Kruber
> Priority: Critical
> Fix For: 1.5.6, 1.6.3, 1.7.0
>
>
> {{PartitionRequestClientHandler#stagedMessages}} may be accessed from
> multiple threads:
> 1) Netty's IO thread
> 2) During cancellation,
> {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}} is
> called
> If {{PartitionRequestClientHandler.BufferListenerTask#notifyBufferDestroyed}}
> thinks, {{stagesMessages}} is empty, however, it will not install the
> {{stagedMessagesHandler}} that consumes and releases buffers from received
> messages.
> Unless some unexpected combination of code calls prevents this from
> happening, this would leak the non-recycled buffers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)