[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1146654972 Merged this to main manually -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1146109215 @Miuler I looked into this a little and this is the only thing we need to change for this to work (and make your test to pass): ``` diff --git a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java index 442206f9..f5beddbb 100644 --- a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java +++ b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java @@ -324,7 +324,12 @@ public class ApplicationReconciler extends AbstractDeploymentReconciler { @Override @SneakyThrows protected void shutdown(FlinkDeployment flinkApp) { -flinkService.cancelJob(flinkApp, UpgradeMode.STATELESS); +var status = flinkApp.getStatus(); +if (status.getReconciliationStatus().getLastReconciledSpec() == null) { +flinkService.deleteClusterDeployment(flinkApp.getMetadata(), status, true); +} else { +flinkService.cancelJob(flinkApp, UpgradeMode.STATELESS); +} } private void triggerSavepoint(FlinkDeployment deployment) throws Exception { ``` This is also a more straightforward fix because the main problem is calling cancel job on something that was never deployed (it doesn't really make sense in the first place so no wonder it was broken) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145936781 @Miuler i will take a look later tonight to see what is the minimal fix that we need. Then we can discuss if we want to refactor the cancel in a separate ticket maybe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145900109 And please don’t get me wrong, all I am trying to do is keep this as simple as possible :) I have already spent quite a lot of time simplifying these methods and the flow to make it more understandable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145898541 Do you have a yaml that I can use to reproduce this? I am still not convinced that we need to refactor etc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145773156 In addition I would like to kindly ask you to not include any refactorings with this fix initially. This will help us understand the problem your solution better. After that if we collectively feel that the cancel method needs refactoring we can discuss. At the current stage I think that method is fine as is, but maybe your fix will bring out some new currently overlooked aspects of this logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null
gyfora commented on PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145749943 @Miuler could you please provide a detailed description of what this change does specifically and what is the error scenario that we are handling here? This is a very critical and somewhat complex part of the operator logic so we should try to be aware of the full impact of these changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org