[ https://issues.apache.org/jira/browse/FLINK-26236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494727#comment-17494727 ]
Gyula Fora commented on FLINK-26236: ------------------------------------ Seems like the operator SDK provides an out of the box logic for retrying errors and setting a custom status by implementing a simple interface. We should probably use this and instead if catching the errors and setting the error status, use this directly: [https://javaoperatorsdk.io/docs/features] {{public interface ErrorStatusHandler<T extends HasMetadata> {}} > Track and cap retries in ReconciliationStatus > --------------------------------------------- > > Key: FLINK-26236 > URL: https://issues.apache.org/jira/browse/FLINK-26236 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator > Reporter: Gyula Fora > Priority: Major > > At the moment we retry errors again and again indefinitely. As suggested by > [~t...@apache.org] we should cap the number of retries (or the time spent > retrying). > For this we can include a retrycount in the reconciliiation status, > Also we should distinguish fatal (like config errors) and recoverable errors > with a different exception type and those should not be retried. -- This message was sent by Atlassian Jira (v8.20.1#820001)