[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-04 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1146654972

   Merged this to main manually


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-03 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1146109215

   @Miuler I looked into this a little and this is the only thing we need to 
change for this to work (and make your test to pass):
   
   ```
   diff --git 
a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java
 
b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java
   index 442206f9..f5beddbb 100644
   --- 
a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java
   +++ 
b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/ApplicationReconciler.java
   @@ -324,7 +324,12 @@ public class ApplicationReconciler extends 
AbstractDeploymentReconciler {
@Override
@SneakyThrows
protected void shutdown(FlinkDeployment flinkApp) {
   -flinkService.cancelJob(flinkApp, UpgradeMode.STATELESS);
   +var status = flinkApp.getStatus();
   +if (status.getReconciliationStatus().getLastReconciledSpec() == 
null) {
   +flinkService.deleteClusterDeployment(flinkApp.getMetadata(), 
status, true);
   +} else {
   +flinkService.cancelJob(flinkApp, UpgradeMode.STATELESS);
   +}
}
   
private void triggerSavepoint(FlinkDeployment deployment) throws 
Exception {
   ```
   
   This is also a more straightforward fix because the main problem is calling 
cancel job on something that was never deployed (it doesn't really make sense 
in the first place so no wonder it was broken)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-03 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145936781

   @Miuler i will take a look later tonight to see what is the minimal fix that 
we need. Then we can discuss if we want to refactor the cancel in a separate 
ticket maybe 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-03 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145900109

   And please don’t get me wrong, all I am trying to do is keep this as simple 
as possible :) I have already spent quite a lot of time simplifying these 
methods and the flow to make it more understandable 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-03 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145898541

   Do you have a yaml that I can use to reproduce this? I am still not 
convinced that we need to refactor etc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-03 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145773156

   In addition I would like to kindly ask you to not include any refactorings 
with this fix initially. This will help us understand the problem your solution 
better.
   
   After that if we collectively feel that the cancel method needs refactoring 
we can discuss. At the current stage I think that method is fine as is, but 
maybe your fix will bring out some new currently overlooked aspects of this 
logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [flink-kubernetes-operator] gyfora commented on pull request #252: [FLINK-27889] fix: Catch the error when last reconciled spec is null

2022-06-03 Thread GitBox


gyfora commented on PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#issuecomment-1145749943

   @Miuler could you please provide a detailed description of what this change 
does specifically and what is the error scenario that we are handling here?
   
   This is a very critical and somewhat complex part of the operator logic so 
we should try to be aware of the full impact of these changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org