[ 
https://issues.apache.org/jira/browse/FLINK-26811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521050#comment-17521050
 ] 

Ted Chang edited comment on FLINK-26811 at 4/13/22 12:26 AM:
-------------------------------------------------------------

[~gyfora] 
I am trying to solve step 4 below by [redeploy it using the latest spec and 
state carried over from the previous run for stateful 
applications|https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/custom-resource/job-management.md#stateful-and-stateless-application-upgrades]
 in the doc but no luck. Could you elaborate how you restore a job using an 
existing savepoint ? Thanks.

Here are the steps (assuming this 
[job|https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml]
 was deployed )
1. suspend job with savepoint and verified a savepoint exists in 
/flink-data/savepoints/savepoint-000000-aec3dd08e76d/_metadata
{code:sh}
kubectl patch FlinkDeployment basic-checkpoint-ha-example --type=merge -p 
'{"spec": {"job": {"state": "suspended"}}}' {code}
2. delete the flinkdeployment
{code:sh}
kubectl delete flinkdep basic-checkpoint-ha-example {code}
3. update CRD
{code:sh}
kubectl delete crd flinkdeployments.flink.apache.org.yaml
kubectl create -f <the new crd with updated apiversion>{code}
3.5 Switch to flink-kubernetes-operator with version v1beta1 ?
4. create deployments using new api version (and also restore from the 
savepoint I think)

 


was (Author: JIRAUSER287036):
[~gyfora] 
I am trying to solve step 4 below by [redeploy it using the latest spec and 
state carried over from the previous run for stateful 
applications|https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/custom-resource/job-management.md#stateful-and-stateless-application-upgrades]
 in the doc but no luck. Could you elaborate how you restore a job using an 
existing savepoint ? Thanks.

Here are the steps (assuming this 
[job|https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml]
 was deployed )
1. suspend job with savepoint and verified a savepoint exists in 
/flink-data/savepoints/savepoint-000000-aec3dd08e76d/_metadata
{code:sh}
kubectl patch FlinkDeployment basic-checkpoint-ha-example --type=merge -p 
'{"spec": {"job": {"state": "suspended"}}}' {code}
2. delete the flinkdeployment
{code:sh}
kubectl delete flinkdep basic-checkpoint-ha-example {code}
3. update CRD
{code:sh}
kubectl delete crd flinkdeployments.flink.apache.org.yaml
kubectl create -f <the new crd with updated apiversion>{code}
4. create deployments using new api version (and also restore from the 
savepoint I think)

 

> Document CRD upgrade process
> ----------------------------
>
>                 Key: FLINK-26811
>                 URL: https://issues.apache.org/jira/browse/FLINK-26811
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Kubernetes Operator
>            Reporter: Thomas Weise
>            Assignee: Ted Chang
>            Priority: Major
>             Fix For: kubernetes-operator-0.1.0
>
>
> We need to document how to update the CRD with a newer version. During 
> development, we delete the old CRD and create it from scratch. In an 
> environment with existing deployments that isn't possible, as deleting the 
> CRD would wipe out all existing CRs.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to