Dennis-Mircea commented on code in PR #1101:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/1101#discussion_r3140324850


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java:
##########
@@ -370,9 +370,12 @@ public CancelResult cancelSessionJob(
             switch (suspendMode) {
                 case STATELESS:
                 case CANCEL:
-                    cancelJobOrError(clusterClient, status, suspendMode == 
SuspendMode.STATELESS);
-                    // This is async we need to return and re-observe
-                    return CancelResult.pending();
+                    if (cancelJobOrError(

Review Comment:
   One concern on the scope: this PR only hardens the `FlinkSessionJob` cancel 
path (`AbstractFlinkService#cancelSessionJob` / the session-job finalizer), but 
`AbstractFlinkService#cancelJob`, used for the `FlinkDeployment` (application 
cluster) path, is left untouched and can hit the exact same failure mode 
described in [FLINK-37766](https://issues.apache.org/jira/browse/FLINK-37766):
   
   Could we:
   1. Extend the new "already terminal / not found" handling to 
`AbstractFlinkService#cancelJob`?
   2. Add a unit test mirroring the new session-job test, but against 
`cancelJob`, with a mocked JM response returning the terminal-state error (and 
ideally a 404), asserting the finalizer completes cleanly.
   3. Update the PR description (and ideally the JIRA) to make clear the fix 
covers both CR types?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to