Hello everyone,

I'm trying to move savepoint to another s3 account but restore always
failed with some weird 404 error.

We are using lyft k8s operator [1] and flink 1.13.6 (in stacktrace you can
see version 1.13.6-396a8d44-szn which is just internal build from flink
commit b2ca390d478aa855eb0f2028d0ed965803a98af1)

What I'm trying to do:

   1. create savepoint for pipeline via ./bin/flink savepoint <JOB_ID>
   2. copy data under path configured in state.savepoints.dir from source
   s3 to new s3
   3. change all configuration and restore pipeline

Is this steps correct or I'm doing something wrong or unsupported?

All options related to s3 have valid values for new s3 account but restore
failed with exception bellow. Error message includes original path
(s3://flink/savepoints/activity-searched-query) which doesn't exists on new
account so that 404 is expected. But I still don't understand why flink
tries that path because related config options contains new bucket info.
    high-availability.storageDir:
's3://<NEW_BUCKET>/ha/pipelines-runner-activity-searched-query'

jobmanager.archive.fs.dir: 's3://<NEW_BUCKET>/history'

state.checkpoints.dir:
> 's3://<NEW_BUCKET>/checkpoints/activity-searched-query'

state.savepoints.dir:
> 's3://<NEW_BUCKET>/savepoints/activity-searched-query'


+ valid values for s3.access-key and s3.secret-key

I found original path in _metadata file in savepoint data but changing that
(search&replace) leads to some weird OOM, I hope this should not be needed
and that values should be ignored.

state.backend is hashmap if it is important.

Restore back from source butcket works as expected.

Thanks a lot!

Regards,
L.

Stacktrace:

2022-03-08 15:39:25,838 [flink-akka.actor.default-dispatcher-4] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph -
> CombineToSearchedQuery -> (LateElementsCounter, TransformToStreamElement ->
> Sink: SearchedQueryKafkaSink) (1/2) (0c0f108c393b9a5b58f861c1032671d0)
> switched from INITIALIZING to FAILED on 10.67.158.155:45521-d8d19d @
> 10.67.158.155 (dataPort=36341).
> org.apache.flink.util.SerializedThrowable: Exception while creating
> StreamOperatorStateContext.
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:254)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:441)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:585)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:565)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:540)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at java.lang.Thread.run(Thread.java:832) ~[?:?]
> Caused by: org.apache.flink.util.SerializedThrowable: Could not restore
> keyed state backend for
> WindowOperator_bd2a73c53230733509ca171c6476fcc5_(1/2) from any of the 1
> provided restore options.
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:160)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> ... 10 more
> Caused by: org.apache.flink.util.SerializedThrowable: Failed when trying
> to restore heap backend
> at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.restoreState(HeapKeyedStateBackendBuilder.java:177)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackendBuilder.build(HeapKeyedStateBackendBuilder.java:111)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:131)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.runtime.state.hashmap.HashMapStateBackend.createKeyedStateBackend(HashMapStateBackend.java:73)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.runtime.state.StateBackend.createKeyedStateBackend(StateBackend.java:136)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163)
> ~[flink-dist_2.11-1.13.6-396a8d44-szn.jar:1.13.6-396a8d44-szn]
> ... 10 more
> Caused by: org.apache.flink.util.SerializedThrowable:
> com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon
> S3; Status Code: 404; Error Code: NoSuchBucket; Request ID:
> tx0000000000000011aacd6-0062276a9d-85d05-default; S3 Extended Request ID:
> 85d05-default-default; Proxy: null), S3 Extended Request ID:
> 85d05-default-default (Path:
> s3://flink/savepoints/activity-searched-query/savepoint-ff3caa-f4b6db96b68b/000e1fc6-0ed8-452a-a8f2-57650fa0594d)
> at
> com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$1(PrestoS3FileSystem.java:917)
> ~[?:?]
> at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138) ~[?:?]
>

[1] https://github.com/lyft/flinkk8soperator

Reply via email to