[ 
https://issues.apache.org/jira/browse/FLINK-38112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yude updated FLINK-38112:
-------------------------
    Description: 
This pull request aligns Flink’s default for the YARN configuration option 
{{yarn.application-attempt-failures-validity-interval}} with YARN itself.
The previous default (10 000 ms) caused unexpected endless AM restarts once the 
interval between two failures exceeded ten seconds.

*Why {{-1}} instead of another fixed window?*
Since every environment performs differently, some restart AM in 30 seconds, 
some in 3 seconds. There is no fixed time that fits everyone.
Setting the default to *{{-1}} (global counting)* removes the hidden assumption 
and lets users choose a window that matches their own infrastructure when 
needed.

  was:
The default setting of 10 seconds does not make sense. Restart times can vary 
significantly between environments—some restart the AM in 60 seconds, while 
others do so in just 3 seconds. There is no single duration that works for all 
situations.

The setting should align with YARN's default value (-1).


> Change default of yarn.application-attempt-failures-validity-interval to -1
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-38112
>                 URL: https://issues.apache.org/jira/browse/FLINK-38112
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>            Reporter: yude
>            Priority: Minor
>              Labels: configuration, default, pull-request-available, yarn
>
> This pull request aligns Flink’s default for the YARN configuration option 
> {{yarn.application-attempt-failures-validity-interval}} with YARN itself.
> The previous default (10 000 ms) caused unexpected endless AM restarts once 
> the interval between two failures exceeded ten seconds.
> *Why {{-1}} instead of another fixed window?*
> Since every environment performs differently, some restart AM in 30 seconds, 
> some in 3 seconds. There is no fixed time that fits everyone.
> Setting the default to *{{-1}} (global counting)* removes the hidden 
> assumption and lets users choose a window that matches their own 
> infrastructure when needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to