StephanEwen commented on a change in pull request #12137:
URL: https://github.com/apache/flink/pull/12137#discussion_r426152842



##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -186,10 +188,18 @@ private void maybeHandleTaskFailure(final 
TaskExecutionState taskExecutionState,
 
        private void handleTaskFailure(final ExecutionVertexID 
executionVertexId, @Nullable final Throwable error) {
                setGlobalFailureCause(error);
+               notifyCoordinatorsAboutTaskFailure(executionVertexId, error);

Review comment:
       I think we may need to remove that check anyways. It is quite possible 
that tasks fail immediately after they were deployed, before they could 
register.
   
   The `OperatorCoordinator` interface also has no notion of "registering", so 
it cannot decide which notifications to forward and which not. I'd rather err 
on the "forward too many" side, because missing a notification easily leads to 
a stall or data loss (split not added back, result is missing).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to