Roman Khachatryan created FLINK-23862:
-----------------------------------------

             Summary: Race condition while cancelling task during initialization
                 Key: FLINK-23862
                 URL: https://issues.apache.org/jira/browse/FLINK-23862
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Task
    Affects Versions: 1.14.0
            Reporter: Roman Khachatryan
             Fix For: 1.14.0


While debugging the recent failures in FLINK-22889, I see that sometimes the 
operator chain is not closed if the task is cancelled while it's being 
initialized.

 

The reason is that on restore(), cleanUpInvoke() is only closed if there was an 
exception, including CancelTaskException.

The latter is only thrown if StreamTask.canceled is set, i.e. TaskCanceler has 
called StreamTask.cancel().

 

So if StreamTask is cancelled in between restore and normal invoke then it may 
not close the operator chain and not do other cleanup.

 

One solution is to make StreamTask.cleanup visible to and called from Task.

 

cc: [~akalashnikov], [~pnowojski]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to