[
https://issues.apache.org/jira/browse/FLINK-37354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Solovev updated FLINK-37354:
----------------------------------
Description:
Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it
is set via java api.
{code:java}
var checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
{code}
will lead to exceptions and therefore restarting the job manager
{noformat}
2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils [INFO
][flink-jobs/job-1] >>> Event[Job] | Warning | RESTARTUNHEALTHYJOB |
Restarting unhealthy job
{noformat}
nevertheless there are ways to mitigate this:
# disable
*kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*
# set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to
two ours as well
# never use java api for setting checkpoint interval
was:
Kubernetes Operator HealthCheck is not aligned with checkpoint interval when it
is set via java api.
{code:java}
var checkpointConfig = env.getCheckpointConfig();
checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
{code}
will lead to exceptions and therefore restarting the job manager
{noformat}
2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils [INFO
][flink-jobs/job-1] >>> Event[Job] | Warning | RESTARTUNHEALTHYJOB |
Restarting unhealthy job
{noformat}
nevertheless there are ways to mitigate this:
# disable
*kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*
# set *kubernetes.operator.cluster.health-check.checkpoint-progress.window* to
two ours as well
# never use java api for setting checkpoint interval
> Kubernetes Operator HealthCheck compatibility
> ---------------------------------------------
>
> Key: FLINK-37354
> URL: https://issues.apache.org/jira/browse/FLINK-37354
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: 1.10.0
> Reporter: Anton Solovev
> Priority: Minor
>
> Kubernetes Operator HealthCheck is not aligned with checkpoint interval when
> it is set via java api.
> {code:java}
> var checkpointConfig = env.getCheckpointConfig();
> checkpointConfig.setCheckpointInterval(Duration.ofHours(2).toMillis());
> {code}
> will lead to exceptions and therefore restarting the job manager
> {noformat}
> 2025-01-28 10:15:32,435 o.a.f.k.o.l.AuditUtils [INFO
> ][flink-jobs/job-1] >>> Event[Job] | Warning | RESTARTUNHEALTHYJOB |
> Restarting unhealthy job
> {noformat}
> nevertheless there are ways to mitigate this:
> # disable
> *kubernetes.operator.cluster.health-check.checkpoint-progress.enabled*
> # set *kubernetes.operator.cluster.health-check.checkpoint-progress.window*
> to two ours as well
> # never use java api for setting checkpoint interval
--
This message was sent by Atlassian Jira
(v8.20.10#820010)