[ https://issues.apache.org/jira/browse/FLINK-32111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723164#comment-17723164 ]
Gyula Fora commented on FLINK-32111: ------------------------------------ I have seen this issue once in the past, could be that some strange data was returned by the flink rest api resulting in a null somewhere in the response. We should add a null check in the logic. > Replacing cluster in failed state with a new one failed > ------------------------------------------------------- > > Key: FLINK-32111 > URL: https://issues.apache.org/jira/browse/FLINK-32111 > Project: Flink > Issue Type: Bug > Components: Kubernetes Operator > Reporter: Tamir Sagi > Priority: Major > Attachments: operator-error.txt > > > I deployed a problematic cluster(HA enabled with 3 JMs) to check cluster > updates process. The cluster was in restart loops. Then I provided a newer > CRD (Updated several configurations) and expected the cluster to get > re-deployed. however the following exception happened > > Caused by: java.lang.NullPointerException > at > org.apache.flink.kubernetes.operator.service.CheckpointHistoryWrapper.getInProgressCheckpoint(CheckpointHistoryWrapper.java:60) > > at > org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getCheckpointInfo(AbstractFlinkService.java:564) > > at > org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getLastCheckpoint(AbstractFlinkService.java:520) > > at > org.apache.flink.kubernetes.operator.observer.SavepointObserver.observeLatestSavepoint(SavepointObserver.java:209) > > at > org.apache.flink.kubernetes.operator.observer.SavepointObserver.observeSavepointStatus(SavepointObserver.java:73) > > at > org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver.observeFlinkCluster(ApplicationObserver.java:61) > > at > org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:73) > > at > org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:53) > > at > org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:134) > > > upgradeMode was first `last-state` and then I changed it to `stateless` but > it still did not deploy the new cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)