[ 
https://issues.apache.org/jira/browse/FLINK-27127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-27127:
----------------------------------------

    Assignee: Chesnay Schepler

> Local recovery is not triggered on task manager process restart
> ---------------------------------------------------------------
>
>                 Key: FLINK-27127
>                 URL: https://issues.apache.org/jira/browse/FLINK-27127
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.15.0
>            Reporter: Abdullah alkhawatrah
>            Assignee: Chesnay Schepler
>            Priority: Blocker
>
> Hey,
> I am experimenting with the support of local recovery after process restart 
> introduced in 1.15. I am trying this on minikube.
> So far, it seems that every time a pod restarts, remote recovery is triggered.
> I have created a repo with everything needed to test it locally with 
> minikube: [https://github.com/akhawatrahTW/flink-local-recovery-test].
> The readme contains the steps to reproduce.
>  
> Based on the documentation, I was expecting to have local recovery triggered 
> on pod restarts since the needed configs are set: 
> [https://github.com/akhawatrahTW/flink-local-recovery-test/blob/bfef14e45f475ba953a05b50b8829d9d33bdcec6/k8s/flink-configuration-configmap.yaml#L27.]
> So was expecting to see something similar to this in the logs of the 
> recreated task manager pod:
> *Expected:*
> {code:java}
> 2022-04-07 09:17:17,637 INFO  
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation
>  [] - Starting to restore from state handle: 
> IncrementalLocalKeyedStateHandle{metaDataState=File State: 
> file:/pv/tm_flink-taskmanager-2/localState/aid_e56a834e076a6d8f9dc1a2997e97a91a/jid_f88542b420546fadbc94db66b00cb5a0/vtx_20ba6b65f97481d5570070de90e4e791_sti_2/chk_1208/c2756339-8938-4949-84ff-d7ee3f4c55cf
>  [479 bytes]} 
> DirectoryKeyedStateHandle{directoryStateHandle=DirectoryStateHandle{directory=/pv/tm_flink-taskmanager-2/localState/aid_e56a834e076a6d8f9dc1a2997e97a91a/jid_f88542b420546fadbc94db66b00cb5a0/vtx_20ba6b65f97481d5570070de90e4e791_sti_2/chk_1208/5455302ce9554a1f81365aee368f267e},
>  keyGroupRange=KeyGroupRange{startKeyGroup=86, endKeyGroup=127}} without 
> rescaling.{code}
>  
>  
> But for some reason, remote recovery it triggered:
> *Actual:*
> {code:java}
> 2022-04-07 09:17:18,405 INFO  
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation
>  [] - Finished restoring from state handle: 
> IncrementalRemoteKeyedStateHandle{backendIdentifier=544f3300-36bd-40a6-9ee3-f78b0e47dfd6,
>  stateHandleId=c2753d01-2f6b-49f0-9ca1-df6b54c61490, 
> keyGroupRange=KeyGroupRange{startKeyGroup=0, endKeyGroup=42}, 
> checkpointId=1208, 
> sharedState={001526.sst=ByteStreamStateHandle{handleName='f5a113d0-8094-40e7-a1b1-adc4cfc690c2',
>  dataBytes=23107}, 
> 001527.sst=ByteStreamStateHandle{handleName='3806411e-8213-406a-bbd8-e498ab19d118',
>  dataBytes=15579}, 
> 001528.sst=ByteStreamStateHandle{handleName='4fef6ead-1522-4f61-a6ad-399b334b41ca',
>  dataBytes=15839}, 
> 001529.sst=ByteStreamStateHandle{handleName='f1324a0c-3eae-46b0-acc2-c03d32b0c24a',
>  dataBytes=16055}}, 
> privateState={OPTIONS-001237=ByteStreamStateHandle{handleName='2e36d07b-5f91-4c9d-9778-5a16bb6254d5',
>  dataBytes=9924}, 
> MANIFEST-001234=ByteStreamStateHandle{handleName='4c95b38a-4afa-4154-9c89-9518d6384a25',
>  dataBytes=27356}, 
> CURRENT=ByteStreamStateHandle{handleName='17bd5bab-c369-470a-bf29-e76279cef2ba',
>  dataBytes=16}}, 
> metaStateHandle=ByteStreamStateHandle{handleName='15827f44-0ab2-4562-b8eb-812b8d260206',
>  dataBytes=479}, registered=false} without rescaling.{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to