[jira] [Updated] (FLINK-21053) Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM

Weijie Guo (Jira) Tue, 18 Mar 2025 03:00:26 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-21053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Weijie Guo updated FLINK-21053:
-------------------------------
    Affects Version/s: 2.1.0

> Prevent potential RejectedExecutionExceptions in CheckpointCoordinator 
> failing JM
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-21053
>                 URL: https://issues.apache.org/jira/browse/FLINK-21053
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 2.1.0
>            Reporter: Roman Khachatryan
>            Priority: Minor
>              Labels: auto-unassigned
>             Fix For: 2.0.0
>
>
> In the past, there were multiple bugs caused by throwing/handling 
> RejectedExecutionException in CheckpointCoordinator (FLINK-18290, 
> FLINK-20992).
>  
> And I think it's still possible as there are many places where an executor is 
> passed to calls to CompletableFuture.xxxAsync while it can already be shut 
> down.
>  
> In FLINK-20992 we discussed two approaches to fix this.
> One approach is to check executor state inside a synchronized block every 
> time when it is used.
> Second approach is to
>  # Create executors inside CheckpointCoordinator (both io & timer thread 
> pools)
>  # Check isShutdown() in their RejectedExecution handlers (if yes and it's 
> RejectedExecutionException then just log; otherwise delegate to 
> FatalExitExceptionHandler)
>  # (this will allow to remove such RejectedExecutionException checks from 
> coordinator code)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-21053) Prevent potential RejectedExecutionExceptions in CheckpointCoordinator failing JM

Reply via email to