[ 
https://issues.apache.org/jira/browse/FLINK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijiang closed FLINK-18050.
----------------------------
    Resolution: Fixed

Merged in master: ed7b0b1bea84a10ee45d10343f239cd183659a74, 

f2dd4b8500a82532dae17087c227ce34e1aeac9b

Merged in release-1.11: a233c0ff82273ca59bb1decdb1ffb6020d27ccfd,  
822e01b613b0b6821383f3cd5b0357054242b6a9

> Fix the bug of recycling buffer twice once exception in 
> ChannelStateWriteRequestDispatcher#dispatch
> ---------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18050
>                 URL: https://issues.apache.org/jira/browse/FLINK-18050
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>            Reporter: Zhijiang
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0, 1.12.0
>
>
> When task finishes, the `CheckpointBarrierUnaligner` will decline the current 
> checkpoint, which would write abort request into `ChannelStateWriter`.
> The abort request will be executed before other write output request in the 
> queue, and close the underlying `CheckpointStateOutputStream`. Then when the 
> dispatcher executes the next write output request to access the stream, it 
> will throw ClosedByInterruptException to make dispatcher thread exit.
> In this process, the underlying buffers for current write output request will 
> be recycled twice. 
>  * ChannelStateCheckpointWriter#write will recycle all the buffers in finally 
> part, which can cover both exception and normal cases.
>  * ChannelStateWriteRequestDispatcherImpl#dispatch will call 
> `request.cancel(e)`  to recycle the underlying buffers again in the case of 
> exception.
> The effect of this bug can cause further exception in the network shuffle 
> process, which references the same buffer as above, then this exception will 
> send to the downstream side to make it failure.
>  
> This bug can be reproduced easily via running 
> UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointOnParallelRemoteChannel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to