Re: [PR] [FLINK-33763] Support savepoint redeployment through a nonce [flink-kubernetes-operator]

via GitHub Thu, 07 Dec 2023 03:56:40 -0800


gyfora commented on code in PR #724:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/724#discussion_r1418852248



##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractJobReconciler.java:
##########
@@ -306,6 +319,40 @@ protected void resubmitJob(FlinkResourceContext<CR> ctx, 
boolean requireHaMetada
         restoreJob(ctx, specToRecover, ctx.getObserveConfig(), 
requireHaMetadata);
     }
 
+    private void redeployWithSavepoint(
+            FlinkResourceContext<CR> ctx,
+            Configuration deployConfig,
+            CR resource,
+            STATUS status,
+            SPEC currentDeploySpec,
+            JobState desiredJobState)
+            throws Exception {
+        LOG.info("Redeploying from savepoint");
+        cancelJob(ctx, UpgradeMode.STATELESS);

Review Comment:
   The only 2 meaningful options here are either STATELESS / SAVEPOINT (last 
state wouldn't trigger a new checkpoint).
   
   The current rationale is that you redeploy only in cases when you want to 
reset/fix your job by going back to a specific state. In most of these cases 
the job is already failing so savepointing would not be applicable.
   
   If the savepoint path is corrupted or doesn't exist, the user is expected to 
redeploy again by fixing the path which should be doable.
   
   The big downside I see with taking a savepoint is that you may be 
redeploying because the job / operator has problems executing regular upgrades, 
save points timing out etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-33763] Support savepoint redeployment through a nonce [flink-kubernetes-operator]

Reply via email to