XComp commented on pull request #14798:
URL: https://github.com/apache/flink/pull/14798#issuecomment-778281634


   The test instability is caused by the change adding the requirement that 
`failureCause` need to be set if the `executionState` is `FAILED`. The failing 
test runs a task but doesn't wait for the task to be finished. The 
`TaskExecutor` triggers the cancellation of the `Task` through 
[Task.failExternally](https://github.com/XComp/flink/blob/354eb574c98bbc5fdaedabc9f3ca7e4acfaa3746/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1129)
 passing in a `FlinkException: Disconnect from JobManager responsible for 
[...]`. This will trigger the `Task` to transition into `FAILED`. A context 
switch happens after the state is set but before the `failureCause` is set. 
Then, the `Task` finishes in the main thread, cleans up and calls 
[notifyFinalState](https://github.com/XComp/flink/blob/354eb574c98bbc5fdaedabc9f3ca7e4acfaa3746/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L904)
 as part of this which initializes a `TaskExecutionState` and run
 s into the `IllegalStateException` since the `failureCause` is still not set.
   
   The solution would be to do the state change and `failureCause` atomically. 
But I'm not sure whether this would be a performance problem. We would have to 
switch from 
[compareAndSet](https://github.com/XComp/flink/blob/354eb574c98bbc5fdaedabc9f3ca7e4acfaa3746/flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java#L1064)
 to a normal lock, I guess. Alternatively, we could remove the invariant again 
and handle the `null` in the exception history. @tillrohrmann what are your 
thoughts on that?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to