[ https://issues.apache.org/jira/browse/FLINK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhijiang closed FLINK-18050. ---------------------------- Resolution: Fixed Merged in master: ed7b0b1bea84a10ee45d10343f239cd183659a74, f2dd4b8500a82532dae17087c227ce34e1aeac9b Merged in release-1.11: a233c0ff82273ca59bb1decdb1ffb6020d27ccfd, 822e01b613b0b6821383f3cd5b0357054242b6a9 > Fix the bug of recycling buffer twice once exception in > ChannelStateWriteRequestDispatcher#dispatch > --------------------------------------------------------------------------------------------------- > > Key: FLINK-18050 > URL: https://issues.apache.org/jira/browse/FLINK-18050 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.11.0 > Reporter: Zhijiang > Assignee: Roman Khachatryan > Priority: Blocker > Labels: pull-request-available > Fix For: 1.11.0, 1.12.0 > > > When task finishes, the `CheckpointBarrierUnaligner` will decline the current > checkpoint, which would write abort request into `ChannelStateWriter`. > The abort request will be executed before other write output request in the > queue, and close the underlying `CheckpointStateOutputStream`. Then when the > dispatcher executes the next write output request to access the stream, it > will throw ClosedByInterruptException to make dispatcher thread exit. > In this process, the underlying buffers for current write output request will > be recycled twice. > * ChannelStateCheckpointWriter#write will recycle all the buffers in finally > part, which can cover both exception and normal cases. > * ChannelStateWriteRequestDispatcherImpl#dispatch will call > `request.cancel(e)` to recycle the underlying buffers again in the case of > exception. > The effect of this bug can cause further exception in the network shuffle > process, which references the same buffer as above, then this exception will > send to the downstream side to make it failure. > > This bug can be reproduced easily via running > UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointOnParallelRemoteChannel. -- This message was sent by Atlassian Jira (v8.3.4#803005)