Hi everyone!

I have encountered similar behavior in the case of native k8s HA with
multiple JobManagers.
Therefore, I have a question - were there any plans to add the ability
not to restart the job during the change
of the JobManagers leader? Or are there certain insurmountable
obstacles preventing this from being done?

I'm not very familiar with the internals of Flink, but I've tried to figure out
if it's possible to do this. As far as I understand, the first
important problem is that JM
does not have the necessary information to manage the job after the
re-election of the leader.
Now JM can only get ExecutionAttemptIds of active tasks running on
TaskExecutors in heartbeat payload
(just as the reconciliation mechanism is doing now) and information
from HA store (checkpoint info, JobGraph).
Such important data as the distribution of tasks by slots is lost
after the death of the previous leader.
At first I thought that it would be possible to adapt the
ArchivedExecutionGraph for the purpose of saving runtime information,
but then realized that it was created for completely different purposes.
Anyway, if we get such information we can add some SchedulingStrategy
wrapper for, at least,
using current active Tasks instead of deploying new Executions.

What do you think?

Best regards,
Manasyan Tigran

On 2022/05/10 08:59:13 Konstantin Knauf wrote:
> Hi Matyas,
>
> yes, that's expected. The feature should have never been called "high
> availability", but something like "Flink Jobmanager failover", because
> that's what it is.
>
> With standby Jobmanagers what you gain is a faster failover, because a new
> Jobmanager does not need to be started before restarting the Job. That's
> all.
>
> Cheers,
>
> Konstantin
>
> Am Di., 10. Mai 2022 um 10:56 Uhr schrieb Őrhidi Mátyás <
> matyas.orh...@gmail.com>:
>
> > Hi Folks!
> >
> > I've been goofing around with the JobManager HA configs using multiple JM
> > replicas (in the Flink Kubernetes Operator). It's working seemingly fine,
> > however the job itself is being restarted when you kill the leader JM pod.
> > Is this expected?
> >
> > Thanks,
> > Matyas
> >
>
>
> --
> https://twitter.com/snntrable
> https://github.com/knaufk
>

Reply via email to