[ https://issues.apache.org/jira/browse/FLINK-26114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-26114: ---------------------------------- Description: In contrast to the {{AdaptiveScheduler}}, the {{DefaultScheduler}} fails fatally in case of an error while cleaning up the checkpoint-related resources. This contradicts our new approach of retrying the cleanup of job-related data (see FLINK-25433). Instead, we would want the {{DefaultScheduler}} to return an exceptionally completed future with the exception. This enables the {{DefaultResourceCleaner}} to trigger a retry. Both scheduler implementations do not expose the error during shutdown of the {{CompletedCheckpointStore}} or {{CheckpointIDCounter}} right now. This would need to be addressed as well. was:In contrast to the {{AdaptiveScheduler}}, the {{DefaultScheduler}} fails fatally in case of an error while cleaning up the checkpoint-related resources. This contradicts our new approach of retrying the cleanup of job-related data (see FLINK-25433). Instead, we would want the {{DefaultScheduler}} to return an exceptionally completed future with the exception. This enables the {{DefaultResourceCleaner}} to trigger a retry. > DefaultScheduler fails fatally in case of an error when shutting down the > checkpoint-related resources > ------------------------------------------------------------------------------------------------------ > > Key: FLINK-26114 > URL: https://issues.apache.org/jira/browse/FLINK-26114 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination > Affects Versions: 1.15.0 > Reporter: Matthias Pohl > Assignee: Niklas Semmler > Priority: Critical > > In contrast to the {{AdaptiveScheduler}}, the {{DefaultScheduler}} fails > fatally in case of an error while cleaning up the checkpoint-related > resources. This contradicts our new approach of retrying the cleanup of > job-related data (see FLINK-25433). Instead, we would want the > {{DefaultScheduler}} to return an exceptionally completed future with the > exception. This enables the {{DefaultResourceCleaner}} to trigger a retry. > Both scheduler implementations do not expose the error during shutdown of the > {{CompletedCheckpointStore}} or {{CheckpointIDCounter}} right now. This would > need to be addressed as well. -- This message was sent by Atlassian Jira (v8.20.7#820007)