[ 
https://issues.apache.org/jira/browse/FLINK-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijiang updated FLINK-17823:
-----------------------------
    Comment: was deleted

(was: Merged in master: 8c7c7267be95cddd7122d2b97e5334f5db4cc37c)

> Resolve the race condition while releasing RemoteInputChannel
> -------------------------------------------------------------
>
>                 Key: FLINK-17823
>                 URL: https://issues.apache.org/jira/browse/FLINK-17823
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.11.0
>            Reporter: Zhijiang
>            Assignee: Zhijiang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>
> RemoteInputChannel#releaseAllResources might be called by canceler thread. 
> Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer. 
> There probably cause two potential problems:
>  * Task thread might get null buffer after canceler thread already released 
> all the buffers, then it might cause misleading NPE in getNextBuffer.
>  * Task thread and canceler thread might pull the same buffer concurrently, 
> which causes unexpected exception when the same buffer is recycled twice.
> The solution is to properly synchronize the buffer queue in release method to 
> avoid the same buffer pulled by both canceler thread and task thread. And in 
> getNextBuffer method, we add some explicit checks to avoid misleading NPE and 
> hint some valid exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to