Roman Khachatryan created FLINK-23862:
-----------------------------------------
Summary: Race condition while cancelling task during initialization
Key: FLINK-23862
URL: https://issues.apache.org/jira/browse/FLINK-23862
Project: Flink
Issue Type: Bug
Components: Runtime / Task
Affects Versions: 1.14.0
Reporter: Roman Khachatryan
Fix For: 1.14.0
While debugging the recent failures in FLINK-22889, I see that sometimes the
operator chain is not closed if the task is cancelled while it's being
initialized.
The reason is that on restore(), cleanUpInvoke() is only closed if there was an
exception, including CancelTaskException.
The latter is only thrown if StreamTask.canceled is set, i.e. TaskCanceler has
called StreamTask.cancel().
So if StreamTask is cancelled in between restore and normal invoke then it may
not close the operator chain and not do other cleanup.
One solution is to make StreamTask.cleanup visible to and called from Task.
cc: [~akalashnikov], [~pnowojski]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)