[
https://issues.apache.org/jira/browse/FLINK-38112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yude updated FLINK-38112:
-------------------------
Description:
This pull request aligns Flink’s default for the YARN configuration option
{{yarn.application-attempt-failures-validity-interval}} with YARN itself.
The previous default (10 000 ms) caused unexpected endless AM restarts once the
interval between two failures exceeded ten seconds.
*Why {{-1}} instead of another fixed window?*
Since every environment performs differently, some restart AM in 30 seconds,
some in 3 seconds. There is no fixed time that fits everyone.
Setting the default to *{{-1}} (global counting)* removes the hidden assumption
and lets users choose a window that matches their own infrastructure when
needed.
was:
The default setting of 10 seconds does not make sense. Restart times can vary
significantly between environments—some restart the AM in 60 seconds, while
others do so in just 3 seconds. There is no single duration that works for all
situations.
The setting should align with YARN's default value (-1).
> Change default of yarn.application-attempt-failures-validity-interval to -1
> ---------------------------------------------------------------------------
>
> Key: FLINK-38112
> URL: https://issues.apache.org/jira/browse/FLINK-38112
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / YARN
> Reporter: yude
> Priority: Minor
> Labels: configuration, default, pull-request-available, yarn
>
> This pull request aligns Flink’s default for the YARN configuration option
> {{yarn.application-attempt-failures-validity-interval}} with YARN itself.
> The previous default (10 000 ms) caused unexpected endless AM restarts once
> the interval between two failures exceeded ten seconds.
> *Why {{-1}} instead of another fixed window?*
> Since every environment performs differently, some restart AM in 30 seconds,
> some in 3 seconds. There is no fixed time that fits everyone.
> Setting the default to *{{-1}} (global counting)* removes the hidden
> assumption and lets users choose a window that matches their own
> infrastructure when needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)