Hi, Puneet:
Like Terry says, if you find your job failed unexpectedly, you could check
the configuration restart-strategy in your flink-conf.yaml. If the restart
strategy is set to be disabled or none, the job will transition to failed
once it encounters a failover. The job would also fail itself
Hi Terry Wang,
So adding to above provided context.. whenever task manager goes down, jobs go
into failed state and do not restart. Even though there are good enough free
slots available on other task manager to get restarted on.
Regards,
Puneet
> On 04-Mar-2022, at 4:54 PM, Terry Wang wrote:
Hi, Puneet~
AFAIK, that should be expected behavior that jobs on crashed TaskManager
restarts. HA means there is no single point risk but Flink job still need
to through failover to ensure state and data consistency. You may refer
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops
Hi,
Currently in production, i have HA session mode flink cluster with 3 job
managers and multiple task managers with more than enough free task slots. But
i have seen multiple times that whenever task manager goes down ( e.g. due to
heartbeat issue).. so does all the jobs running on it even wh