[ https://issues.apache.org/jira/browse/FLINK-26811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521050#comment-17521050 ]
Ted Chang edited comment on FLINK-26811 at 4/13/22 12:26 AM: ------------------------------------------------------------- [~gyfora] I am trying to solve step 4 below by [redeploy it using the latest spec and state carried over from the previous run for stateful applications|https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/custom-resource/job-management.md#stateful-and-stateless-application-upgrades] in the doc but no luck. Could you elaborate how you restore a job using an existing savepoint ? Thanks. Here are the steps (assuming this [job|https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml] was deployed ) 1. suspend job with savepoint and verified a savepoint exists in /flink-data/savepoints/savepoint-000000-aec3dd08e76d/_metadata {code:sh} kubectl patch FlinkDeployment basic-checkpoint-ha-example --type=merge -p '{"spec": {"job": {"state": "suspended"}}}' {code} 2. delete the flinkdeployment {code:sh} kubectl delete flinkdep basic-checkpoint-ha-example {code} 3. update CRD {code:sh} kubectl delete crd flinkdeployments.flink.apache.org.yaml kubectl create -f <the new crd with updated apiversion>{code} 3.5 Switch to flink-kubernetes-operator with version v1beta1 ? 4. create deployments using new api version (and also restore from the savepoint I think) was (Author: JIRAUSER287036): [~gyfora] I am trying to solve step 4 below by [redeploy it using the latest spec and state carried over from the previous run for stateful applications|https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/custom-resource/job-management.md#stateful-and-stateless-application-upgrades] in the doc but no luck. Could you elaborate how you restore a job using an existing savepoint ? Thanks. Here are the steps (assuming this [job|https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml] was deployed ) 1. suspend job with savepoint and verified a savepoint exists in /flink-data/savepoints/savepoint-000000-aec3dd08e76d/_metadata {code:sh} kubectl patch FlinkDeployment basic-checkpoint-ha-example --type=merge -p '{"spec": {"job": {"state": "suspended"}}}' {code} 2. delete the flinkdeployment {code:sh} kubectl delete flinkdep basic-checkpoint-ha-example {code} 3. update CRD {code:sh} kubectl delete crd flinkdeployments.flink.apache.org.yaml kubectl create -f <the new crd with updated apiversion>{code} 4. create deployments using new api version (and also restore from the savepoint I think) > Document CRD upgrade process > ---------------------------- > > Key: FLINK-26811 > URL: https://issues.apache.org/jira/browse/FLINK-26811 > Project: Flink > Issue Type: Sub-task > Components: Kubernetes Operator > Reporter: Thomas Weise > Assignee: Ted Chang > Priority: Major > Fix For: kubernetes-operator-0.1.0 > > > We need to document how to update the CRD with a newer version. During > development, we delete the old CRD and create it from scratch. In an > environment with existing deployments that isn't possible, as deleting the > CRD would wipe out all existing CRs. > -- This message was sent by Atlassian Jira (v8.20.1#820001)