[ https://issues.apache.org/jira/browse/FLINK-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719975#comment-16719975 ]
Edmond commented on FLINK-11132: -------------------------------- Hi [~till.rohrmann], Thanks a lot for your response! Edmond > Restore From Savepoint on HA Setup > ---------------------------------- > > Key: FLINK-11132 > URL: https://issues.apache.org/jira/browse/FLINK-11132 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.6.2, 1.7.0 > Environment: > > Reporter: Edmond > Priority: Major > > In our current setup we have one job-manager (standalone-job.sh) and one > task-manager (taskmanager.sh) deployed as job-cluster in HA mode (ZooKeeper). > We tried to run a simple stateful Flink app that generates periodically > checkpoints and savepoints to a shared storage, in order to re-run it again > from a specific savepoint later. However, when in HA it a seems that it > ignores the savepoint restore flag (--fromSavepoint) and recover from the > last checkpoint instead. When we removed the HA configuration, savepoint > restoration was successful. > > flink-conf.yaml: > high-availability: zookeeper > high-availability.zookeeper.quorum: zookeeper-host:2181 > high-availability.zookeeper.path.root: /flink > high-availability.cluster-id: our_cluster_id > high-availability.storageDir: gs://app_bucket/flink_ns/ha > high-availability.jobmanager.port: 6123 > state.backend.fs.memory-threshold: 0 > state.checkpoints.dir: gs://app_bucket/flink_ns/checkpoints > state.savepoints.dir: gs://app_bucket/flink_ns/savepoints > When we tried to run it in non-HA mode we just removed the > high-availability.* parameters. > Job Manager command before restore: > ./standalone-job.sh start-foreground --job-classname com.TestApp > -Djobmanager.rpc.address=127.0.0.1 -Dparallelism.default=1 > -Dblob.server.port=6124 -Dquery.server.ports=6125 > Job Manager command when trying to restore: > ./standalone-job.sh start-foreground --job-classname com.TestApp > -Djobmanager.rpc.address=127.0.0.1 -Dparallelism.default=1 > -Dblob.server.port=6124 -Dquery.server.ports=6125 --fromSavepoint > gs://app_bucket/flink_ns/savepoints/savepoint_1/savepoint-000000-e7f1f0f63c41 > > Task Manager command: > ./taskmanager.sh start-foreground -Djobmanager.rpc.address=127.0.0.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)