lucasgameiroborges opened a new pull request, #1147: URL: https://github.com/apache/flink-kubernetes-operator/pull/1147
## What is the purpose of the change When HA is enabled and a stateless job is resubmitted (e.g. due to an unhealthy cluster), `resubmitJob` unconditionally overrode the upgrade mode to `LAST_STATE` and passed `requireHaMetadata=true` to `restoreJob`, ignoring the user-configured `STATELESS` upgrade mode. This caused the job to attempt a last-state restore from HA metadata instead of starting fresh. ## Brief change log - `AbstractJobReconciler#resubmitJob`: skip the `LAST_STATE` mode override and the `requireHaMetadata` flag when the spec's upgrade mode is `STATELESS` - Add `ApplicationReconcilerTest#testRestartUnhealthyStatelessJobWithHaEnabled` to reproduce the scenario: a stateless job with HA enabled is resubmitted after an unhealthy event with no HA metadata available — previously this would throw `UpgradeFailureException`, now it succeeds ## Verifying this change New unit test `testRestartUnhealthyStatelessJobWithHaEnabled` in `ApplicationReconcilerTest` covers the fix. Existing `ApplicationReconcilerTest` and `ApplicationReconcilerUpgradeModeTest` suites (109 tests total) continue to pass. ## Does this pull request potentially affect one of the following areas - Job lifecycle/upgrade: **yes** — affects resubmit path for stateless jobs when HA is active _This is a fix for [FLINK-38049](https://issues.apache.org/jira/browse/FLINK-38049)._ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
