Roman Khachatryan created FLINK-21053:
-----------------------------------------
Summary: Prevent further RejectedExecutionExceptions in
CheckpointCoordinator failing JM
Key: FLINK-21053
URL: https://issues.apache.org/jira/browse/FLINK-21053
Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
Fix For: 1.13.0
In the past, there were multiple bugs caused by throwing/handling
RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992).
And I think it's still possible as there are many places where an executor is
passed to calls to CompletableFuture.xxxAsync while it can already be shut down.
In FLINK-20992 we discussed two approaches to fix this.
One approach is to check executor state inside a synchronized block every time
when it is used.
Second approach is to
# Create executors inside CheckpointCoordinator (both io & timer thread pools)
# Check isShutdown() in their error handlers (if yes and it's
RejectedExecutionException then just log; otherwise delegate to
FatalExitExceptionHandler)
# (this will allow to remove such RejectedExecutionException checks from
coordinator code)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)