Tamir Sagi created FLINK-32111: ---------------------------------- Summary: Replacing cluster in failed state with a new one failed Key: FLINK-32111 URL: https://issues.apache.org/jira/browse/FLINK-32111 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: Tamir Sagi Attachments: operator-error.txt
I deployed a problematic cluster(HA enabled with 3 JMs) to check cluster updates process. The cluster was in restart loops. Then I provided newer CRD (Updated several configurations) and expected the cluster to get re-deployed. however the following exception happened Caused by: java.lang.NullPointerException at org.apache.flink.kubernetes.operator.service.CheckpointHistoryWrapper.getInProgressCheckpoint(CheckpointHistoryWrapper.java:60) at org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getCheckpointInfo(AbstractFlinkService.java:564) at org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getLastCheckpoint(AbstractFlinkService.java:520) at org.apache.flink.kubernetes.operator.observer.SavepointObserver.observeLatestSavepoint(SavepointObserver.java:209) at org.apache.flink.kubernetes.operator.observer.SavepointObserver.observeSavepointStatus(SavepointObserver.java:73) at org.apache.flink.kubernetes.operator.observer.deployment.ApplicationObserver.observeFlinkCluster(ApplicationObserver.java:61) at org.apache.flink.kubernetes.operator.observer.deployment.AbstractFlinkDeploymentObserver.observeInternal(AbstractFlinkDeploymentObserver.java:73) at org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:53) at org.apache.flink.kubernetes.operator.controller.FlinkDeploymentController.reconcile(FlinkDeploymentController.java:134) upgradeMode was first `last-state` and then I changed it to `stateless` but it still did not deploy the new cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)