[ https://issues.apache.org/jira/browse/FLINK-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijiang updated FLINK-17823: ----------------------------- Comment: was deleted (was: Merged in master: 8c7c7267be95cddd7122d2b97e5334f5db4cc37c) > Resolve the race condition while releasing RemoteInputChannel > ------------------------------------------------------------- > > Key: FLINK-17823 > URL: https://issues.apache.org/jira/browse/FLINK-17823 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.11.0 > Reporter: Zhijiang > Assignee: Zhijiang > Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0 > > > RemoteInputChannel#releaseAllResources might be called by canceler thread. > Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer. > There probably cause two potential problems: > * Task thread might get null buffer after canceler thread already released > all the buffers, then it might cause misleading NPE in getNextBuffer. > * Task thread and canceler thread might pull the same buffer concurrently, > which causes unexpected exception when the same buffer is recycled twice. > The solution is to properly synchronize the buffer queue in release method to > avoid the same buffer pulled by both canceler thread and task thread. And in > getNextBuffer method, we add some explicit checks to avoid misleading NPE and > hint some valid exceptions. -- This message was sent by Atlassian Jira (v8.3.4#803005)