[ 
https://issues.apache.org/jira/browse/FLINK-38324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035776#comment-18035776
 ] 

Francis commented on FLINK-38324:
---------------------------------

Thanks for the fix [~zakelly]! Do you think this is the same issues as 
documented here? https://issues.apache.org/jira/browse/FLINK-38621

> Job fails to restore keyed state backend when using Forst state backend on 
> S3: FileNotFoundException
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-38324
>                 URL: https://issues.apache.org/jira/browse/FLINK-38324
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>    Affects Versions: 2.0.0, 2.1.0
>         Environment: Observed on both Flink 2.0 and 2.1, 
> running with Forst state backend and state stored on S3,
> job deployed on kubernetes using the Flink apache kubernetes operator.
>            Reporter: Lucas Borges
>            Assignee: Han Yin
>            Priority: Major
>
> Task manager fails with the following exception:
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for AsyncStreamFlatMap_34849252e53e8aeadce2388c44ea80ff_(1/1) 
> from any of the 1 provided restore options.
> Which seems to be caused by the following error:
> Caused by: java.io.IOException: java.io.FileNotFoundException: No such file 
> or directory: 
> s3a://flink-state/flink-2-smoke-testing-job/checkpoints/f3436c2b6f059985605ce8a4f3cdc841/shared/op_AsyncStreamFlatMap_34849252e53e8aeadce2388c44ea80ff__1_1__attempt_0/db/a62f79a2-bca7-40ff-9723-01d9a0d98f4a
> Full stack trace here:
> {panel}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:359)
>       at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:280)
>       at 
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreStateAndGates(StreamTask.java:858)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$restoreInternal$5(StreamTask.java:812)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:812)
>       at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:771)
>       at 
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:963)
>       at 
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:756)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:568)
>       at java.base/java.lang.Thread.run(Thread.java:840)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for AsyncStreamFlatMap_34849252e53e8aeadce2388c44ea80ff_(1/1) 
> from any of the 1 provided restore options.
>       at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:165)
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:486)
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
>       ... 12 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
>       at 
> org.apache.flink.state.forst.ForStKeyedStateBackendBuilder.build(ForStKeyedStateBackendBuilder.java:319)
>       at 
> org.apache.flink.state.forst.ForStStateBackend.createAsyncKeyedStateBackend(ForStStateBackend.java:474)
>       at 
> org.apache.flink.state.forst.ForStStateBackend.createAsyncKeyedStateBackend(ForStStateBackend.java:98)
>       at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$3(StreamTaskStateInitializerImpl.java:475)
>       at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:173)
>       at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
>       ... 14 more
> Caused by: java.io.IOException: java.io.FileNotFoundException: No such file 
> or directory: 
> s3a://flink-state/flink-2-smoke-testing-job/checkpoints/f3436c2b6f059985605ce8a4f3cdc841/shared/op_AsyncStreamFlatMap_34849252e53e8aeadce2388c44ea80ff__1_1__attempt_0/db/a62f79a2-bca7-40ff-9723-01d9a0d98f4a
>       at 
> org.apache.flink.state.forst.datatransfer.CopyDataTransferStrategy.copyFileFromCheckpoint(CopyDataTransferStrategy.java:282)
>       at 
> org.apache.flink.state.forst.datatransfer.CopyDataTransferStrategy.transferFromCheckpoint(CopyDataTransferStrategy.java:91)
>       at 
> org.apache.flink.state.forst.datatransfer.ForStStateDataTransfer.lambda$transferAllStateDataToDirectoryAsync$4(ForStStateDataTransfer.java:305)
>       at 
> org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:48)
>       at 
> java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>       ... 1 more
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://flink-state/flink-2-smoke-testing-job/checkpoints/f3436c2b6f059985605ce8a4f3cdc841/shared/op_AsyncStreamFlatMap_34849252e53e8aeadce2388c44ea80ff__1_1__attempt_0/db/a62f79a2-bca7-40ff-9723-01d9a0d98f4a
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3866)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688)
>       at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.extractOrFetchSimpleFileStatus(S3AFileSystem.java:5401)
>       at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1465)
>       at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:1441)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:976)
>       at 
> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.open(HadoopFileSystem.java:134)
>       at 
> org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.open(HadoopFileSystem.java:38)
>       at 
> org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.open(PluginFileSystemFactory.java:128)
>       at 
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:77)
>       at 
> org.apache.flink.state.forst.datatransfer.CopyDataTransferStrategy.copyFileFromCheckpoint(CopyDataTransferStrategy.java:261)
>       ... 7 more {panel}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to