[ https://issues.apache.org/jira/browse/FLINK-20672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784035#comment-17784035 ]
Yun Tang commented on FLINK-20672: ---------------------------------- [~Zakelly] Thanks for the information. If so, I have another question: do we really need the {{io-executor}} to work with {{FatalExitExceptionHandler}}? From my point of view, if we do not delete the Savepoint correctly (as this is also executed on the {{io-executor}}), shall we need to fail the whole JobManager? If the correct behavior of the exception handler of {{io-executor}} is not fatal exiting, I think we shall correct that behavior first. [~Zakelly], [~roman], [~srichter] WDYT? > notifyCheckpointAborted RPC failure can fail JM > ----------------------------------------------- > > Key: FLINK-20672 > URL: https://issues.apache.org/jira/browse/FLINK-20672 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.11.3, 1.12.0 > Reporter: Roman Khachatryan > Assignee: Zakelly Lan > Priority: Not a Priority > Labels: auto-deprioritized-major, auto-deprioritized-minor, > pull-request-available > > Introduced in FLINK-8871, aborted RPC notifications are done asynchonously: > > {code} > private void sendAbortedMessages(long checkpointId, long timeStamp) { > // send notification of aborted checkpoints asynchronously. > executor.execute(() -> { > // send the "abort checkpoint" messages to necessary > vertices. > // .. > }); > } > {code} > However, the executor that eventually executes this request is created as > follows > {code} > final ScheduledExecutorService futureExecutor = > Executors.newScheduledThreadPool( > Hardware.getNumberCPUCores(), > new ExecutorThreadFactory("jobmanager-future")); > {code} > ExecutorThreadFactory uses UncaughtExceptionHandler that exits JVM on error. > cc: [~yunta] -- This message was sent by Atlassian Jira (v8.20.10#820010)