Roman Khachatryan created FLINK-23862: -----------------------------------------
Summary: Race condition while cancelling task during initialization Key: FLINK-23862 URL: https://issues.apache.org/jira/browse/FLINK-23862 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.14.0 Reporter: Roman Khachatryan Fix For: 1.14.0 While debugging the recent failures in FLINK-22889, I see that sometimes the operator chain is not closed if the task is cancelled while it's being initialized. The reason is that on restore(), cleanUpInvoke() is only closed if there was an exception, including CancelTaskException. The latter is only thrown if StreamTask.canceled is set, i.e. TaskCanceler has called StreamTask.cancel(). So if StreamTask is cancelled in between restore and normal invoke then it may not close the operator chain and not do other cleanup. One solution is to make StreamTask.cleanup visible to and called from Task. cc: [~akalashnikov], [~pnowojski] -- This message was sent by Atlassian Jira (v8.3.4#803005)