[ https://issues.apache.org/jira/browse/FLINK-35288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Keshav Kansal updated FLINK-35288: ---------------------------------- Description: As per the documentation when using the Fixed Delay Restart Strategy, the *restart-strategy.fixed-delay.attempts* defines the "The number of times that Flink retries the execution before the job is declared as failed if has been set to fixed-delay". However in reality it is the *maximum-total-task-failures*, i.e. it is possbile that the job does not even attempt to restart. This is as per documented in https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures If there is an outage at a Sink level, for example Elasticsearch outage, all the independent tasks might fail and the job will immediately fail without restart if restart-strategy.fixed-delay.attempts is set lower or equal to the parallelism of the sink. was: As per the documentation when using the Fixed Delay Restart Strategy, the restart-strategy.fixed-delay.attempts defines the "The number of times that Flink retries the execution before the job is declared as failed if has been set to fixed-delay". However in reality it is the *maximum-total-task-failures*, i.e. it is possbile that the job does not even attempt to restart. This is as per documented in https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures If there is an outage at a Sink level, for example Elasticsearch outage, all the independent tasks might fail and the job will immediately fail without restart if restart-strategy.fixed-delay.attempts is set lower or equal to the parallelism of the sink. > Flink Restart Strategy does not work as documented > -------------------------------------------------- > > Key: FLINK-35288 > URL: https://issues.apache.org/jira/browse/FLINK-35288 > Project: Flink > Issue Type: Bug > Reporter: Keshav Kansal > Priority: Minor > > As per the documentation when using the Fixed Delay Restart Strategy, the > *restart-strategy.fixed-delay.attempts* defines the "The number of times that > Flink retries the execution before the job is declared as failed if has been > set to fixed-delay". > However in reality it is the *maximum-total-task-failures*, i.e. it is > possbile that the job does not even attempt to restart. > This is as per documented in > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1%3A+Fine+Grained+Recovery+from+Task+Failures > If there is an outage at a Sink level, for example Elasticsearch outage, all > the independent tasks might fail and the job will immediately fail without > restart if restart-strategy.fixed-delay.attempts is set lower or equal to the > parallelism of the sink. -- This message was sent by Atlassian Jira (v8.20.10#820010)