gyfora commented on code in PR #724: URL: https://github.com/apache/flink-kubernetes-operator/pull/724#discussion_r1418852248
########## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/reconciler/deployment/AbstractJobReconciler.java: ########## @@ -306,6 +319,40 @@ protected void resubmitJob(FlinkResourceContext<CR> ctx, boolean requireHaMetada restoreJob(ctx, specToRecover, ctx.getObserveConfig(), requireHaMetadata); } + private void redeployWithSavepoint( + FlinkResourceContext<CR> ctx, + Configuration deployConfig, + CR resource, + STATUS status, + SPEC currentDeploySpec, + JobState desiredJobState) + throws Exception { + LOG.info("Redeploying from savepoint"); + cancelJob(ctx, UpgradeMode.STATELESS); Review Comment: The only 2 meaningful options here are either STATELESS / SAVEPOINT (last state wouldn't trigger a new checkpoint). The current rationale is that you redeploy only in cases when you want to reset/fix your job by going back to a specific state. In most of these cases the job is already failing so savepointing would not be applicable. If the savepoint path is corrupted or doesn't exist, the user is expected to redeploy again by fixing the path which should be doable. The big downside I see with taking a savepoint is that you may be redeploying because the job / operator has problems executing regular upgrades, save points timing out etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org