[ https://issues.apache.org/jira/browse/FLINK-38133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
jeremyMu updated FLINK-38133: ----------------------------- Attachment: (was: 486dc1fed2cf5b84922ede34d479015.png) > Unable to find checkpoint status analysis during Flink restart on > k8s,jobmanager created by deployment > ------------------------------------------------------------------------------------------------------ > > Key: FLINK-38133 > URL: https://issues.apache.org/jira/browse/FLINK-38133 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes > Affects Versions: 1.16.2 > Reporter: jeremyMu > Priority: Major > Attachments: 486dc1fed2cf5b84922ede34d479015-1.png, > AgAABj35qkdRNkkxGZlEc4xVnl4UNi2l.png > > > Before exiting abnormally, jm will clear the metadata information of ha > (metadata information such as checkpoint pointers > In actual business operations, the number of TM retries is configured (in > some business scenarios, the taskmanager will not retry indefinitely). If the > TM reaches the retry limit and fails to pull up the job normally, it will > cause the JM to crash. After the JM crashes, the metadata information stored > by HA will be cleared (check the logic in the source code). As a result, when > the JM automatically restarts, it cannot find the HA metadata information, > and thus cannot locate the most recent Checkpoint state -- This message was sent by Atlassian Jira (v8.20.10#820010)