gyfora opened a new pull request, #1097:
URL: https://github.com/apache/flink-kubernetes-operator/pull/1097

   ## What is the purpose of the change
   
   Currently the cluster / job health check logic is sometimes executed on 
terminal/failed jobs which can lead to the operator trying to restart these 
from HA metadata inevitably leading to an unrecoverable failure. 
   We should simply exclude these deployments based on the job status.
   
   This PR adds the required checks on job status and HA metadata to avoid 
unrecoverable errors.
   
   ## Verifying this change
   
   Unit tests extended
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changes to the `CustomResourceDescriptors`: 
no
     - Core observer or reconciler logic that is regularly executed: yes
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to