[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-12-14 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17797017#comment-17797017
 ] 

Rui Fan commented on FLINK-32895:
-

Merged to master(1.19) via :

1c7b10873be475b73d78a083eda6be71fbb13c2b

80e71a47662c70e5cc0d96bfa3962bd37a6d020d

3d4d396e68e6cf7f49b7cc8d94b2c9516ffc2b96

> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attempts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose introducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-08-20 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756636#comment-17756636
 ] 

Rui Fan commented on FLINK-32895:
-

Hi [~zhuzh][~wanglijie], would you mind helping take a look this JIRA in your 
free time? thanks~

BTW, if the improvement is reasonable, it will add a new option and a new 
`exponentialDelayRestart` method in `RestartStrategies` (It's a PublicEvolving 
class), it's a small feature.

I'm not sure whether the FLIP is necessary when adding any options or changing 
any public classes even if it's a small feature. If yes, I can start a FLIP, if 
no, I will follow it here.

> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attemepts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose inctroducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-08-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756714#comment-17756714
 ] 

Zhu Zhu commented on FLINK-32895:
-

Thanks for the proposal! A FLIP is required because it includes changes to 
pubic interfaces (config options). And it is proposing a new feature which 
needs to be seen and set by users.

And maybe we can re-consider the new config option, to make to easier for 
understanding. e.g. introduce a 
{{restart-strategy.exponential-delay.fail-on-exceeding-max-backoff}}.

I would also suggest to not change {{RestartStrategies}} any more because we 
are considering to deprecate it later when improving Flink configuration. 
{{RestartStrategies}} is not flexible for custom restart strategy and can be 
superseded by config options.
Can we delay this work a bit, waiting for the result of the FLIP and ML 
discussion of the deprecation? It should happen soon. 

> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attempts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose introducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-08-21 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756752#comment-17756752
 ] 

Rui Fan commented on FLINK-32895:
-

Thanks [~zhuzh] for the quick feedback!

{quote}A FLIP is required because it includes changes to pubic interfaces 
(config options). And it is proposing a new feature which needs to be seen and 
set by users.{quote}

Got it, thanks for the clarification!

{quote}And maybe we can re-consider the new config option, to make to easier 
for understanding. e.g. introduce a 
restart-strategy.exponential-delay.fail-on-exceeding-max-backoff.{quote}

Good suggestion, and I will record this suggestion, and we can discuss the 
option in the mail list later.

{quote}I would also suggest to not change RestartStrategies any more because we 
are considering to deprecate it later when improving Flink configuration. 
RestartStrategies is not flexible for custom restart strategy and can be 
superseded by config options.
 {quote}

To be honest, I and our internal flink platform always use the config option 
instead of Java code for flink configuration. So I totally agree deprecating 
the RestartStrategies.

{quote}Can we delay this work a bit, waiting for the result of the FLIP and ML 
discussion of the deprecation? It should happen soon.{quote}

Sure, this improvement can wait for deprecating the RestartStrategies, and 
could you ping me if the discussion is started? thanks a lot :)


> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attempts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose introducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-08-21 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757219#comment-17757219
 ] 

Zhu Zhu commented on FLINK-32895:
-

Sure I will ping you in time. [~fanrui]

> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attempts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose introducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-09-09 Thread Rui Fan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763394#comment-17763394
 ] 

Rui Fan commented on FLINK-32895:
-

Hi [~zhuzh] , I created the FLIP-364 in advance due to I found several points 
in the restart strategy that need to be improved. We can discuss them in the 
mail list in the future.

There are 2 option for discussion:
 * Option1: Start discuss FLIP-364 after deprecating the RestartStrategies is 
discussed.
 * Option2: FLIP-364 has serveral points need to be discussed, we can first 
discuss other parts of FLIP-364 besides RestartStrategies. And the 
RestartStrategies part can be followed by your separate FLIP. 

WDYT?

BTW, after some more thought: 
restart-strategy.exponential-delay.fail-on-exceeding-max-backoff may not work 
well. Because the user may want to restart this job multiple times using 
max-backoff before failing it.

For example, users don't want the delay-time to be too long, so they set the 
initial-backoff=1s, backoff-multiplier=2, max-backoff=30s. So the delay time is 
1s, 2s, 4s, 8s, 16s, 30s, 30s, 30s, 30s, 30s, etc. If we introduced the 
`fail-on-exceeding-max-backoff`, it means that the job won't restart when the 
delay-time is extended to 30s at first time. right?

Please correct me if I'm wrong, and looking forward to more feedbacks from 
community, thanks~

 

[1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-364%3A+Improve+the+restart-strategy

> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attempts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose introducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-32895) Introduce the max attempts for Exponential Delay Restart Strategy

2023-09-10 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-32895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763530#comment-17763530
 ] 

Zhu Zhu commented on FLINK-32895:
-

Either option sounds good to me. Feel free to start the discussion earlier if 
you feel there are much uncertainty which needs to be addressed earlier.

> user may want to restart this job multiple times using max-backoff before 
> failing it.
Yes you are right. Yet maybe we can give it a more user friendly name. e.g. 
`max-attempts-before-reset-backoff` looks better to me.

> Introduce the max attempts for Exponential Delay Restart Strategy
> -
>
> Key: FLINK-32895
> URL: https://issues.apache.org/jira/browse/FLINK-32895
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Rui Fan
>Assignee: Rui Fan
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Flink has 3 restart strategies, they are: fixed-delay, 
> failure-rate and exponential-delay.
> The exponential-delay is suitable if a job continues to fail for a period of 
> time. The fixed-delay and failure-rate has the max attempts mechanism, that 
> means, the job won't restart and go to fail after the attempt exceeds the 
> threshold of max attempts. 
> The max attempts mechanism is reasonable, flink should not or need to 
> infinitely restart the job if the job keeps failing. However, the 
> exponential-delay doesn't have the max attempts mechanism.
> I propose introducing the 
> `restart-strategy.exponential-delay.max-attempts-before-reset` to support the 
> max attempts mechanism for exponential-delay. It means flink won't restart 
> job if the number of job failures before reset exceeds 
> max-attempts-before-reset when is exponential-delay is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)