asukawen opened a new pull request, #1145:
URL: https://github.com/apache/flink-kubernetes-operator/pull/1145
## What is the purpose of the change
Kubernetes deployment deletion waits can fail or time out before the old
JobManager deployment is fully removed. The operator currently logs these
failures and continues reconciliation, which can submit a replacement cluster
while the old deployment is still terminating and result in `AlreadyExists`
errors.
## Brief change log
- Propagate non-404 errors while waiting for Kubernetes resources to be
deleted.
- Retry reconciliation instead of creating a replacement cluster before
deletion completes.
- Preserve the best-effort JobManager shutdown behavior before mandatory
deployment deletion.
- Update deletion error and timeout tests.
## Verifying this change
This change added and updated tests and was verified with:
```bash
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home \
mvn -pl flink-kubernetes-operator -am \
-DskipITs \
-Dtest=AbstractFlinkServiceTest \
-Dsurefire.failIfNoSpecifiedTests=false test
```
Tests run: 40, failures: 0, errors: 0, skipped: 0.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changes to the `CustomResourceDescriptors`: no
- Core observer or reconciler logic that is regularly executed: yes
## Documentation
- Does this pull request introduce a new feature? no
- If yes, how is the feature documented? not applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]