gyfora opened a new pull request, #1097:
URL: https://github.com/apache/flink-kubernetes-operator/pull/1097
## What is the purpose of the change
Currently the cluster / job health check logic is sometimes executed on
terminal/failed jobs which can lead to the operator trying to restart these
from HA metadata inevitably leading to an unrecoverable failure.
We should simply exclude these deployments based on the job status.
This PR adds the required checks on job status and HA metadata to avoid
unrecoverable errors.
## Verifying this change
Unit tests extended
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changes to the `CustomResourceDescriptors`:
no
- Core observer or reconciler logic that is regularly executed: yes
## Documentation
- Does this pull request introduce a new feature? no
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]