kiranchavala opened a new issue, #11581: URL: https://github.com/apache/cloudstack/issues/11581
### problem CKS: NPE when trying to remove a external node from a cks cluster ### versions ACS 4.20.x ### The steps to reproduce the bug 1. Register a external cks template https://download.cloudstack.org/testing/custom_templates/ubuntu/22.04/22.04/cks-ubuntu-2204-kvm.qcow2.bz2 2. Launch a cks cluster 3. Launch a Ubuntu vm with the template mentioned above 4. Add the management server public key, once the Ubuntu VM boots up 5. Add the Ubuntu vm as external node to the cks cluster <img width="590" height="384" alt="Image" src="https://github.com/user-attachments/assets/e561e608-75f6-406e-9681-b3c4eddd2745" /> 6. CKS cluster will be in importing state 7. The external node will go be in not-ready state , due to disk issue Login to the external node and check the cloud-init-output.log ``` unpacking registry.k8s.io/etcd:3.5.21-0 (sha256:d58c035df557080a27387d687092e3fc2b64c6d0e3162dc51453a115f847d121)...time="2025-09-04T09:32:05Z" level=info msg="apply failure, attempting cleanup" error="failed to extract layer sha256:edcdf51bd97dae2c7c6a75ab21cf445d5997888402357f5cc36e7582543431ac: write /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/45/fs/usr/local/bin/etcd-3.4.18: no space left on device: unknown" key="extract-939483347-ZepJ sha256:c6230a0bcc0db1264e316a45b18c0a8dfab3c4818a4245035770b4e58967e035" time="2025-09-04T09:32:05Z" level=warning msg="extraction snapshot removal failed" error="write /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db: no space left on device: unknown" key="extract-939483347-ZepJ sha256:c6230a0bcc0db1264e316a45b18c0a8dfab3c4818a4245035770b4e58967e035" ctr: failed to extract layer sha256:edcdf51bd97dae2c7c6a75ab21cf445d5997888402357f5cc36e7582543431ac: write /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/45/fs/usr/local/bin/etcd-3.4.18: no space left on device: unknown ctr: failed to ingest "blobs/sha256/0038afa1c30b6e7c6ed64ebbb3593756f0a5328da72cf3304be62b27cb40139a": failed to open writer: mkdir /var/lib/containerd/io.containerd.content.v1.content/ingest/f2e067fbba1183a1b7465f8eb36511a50660708c165d166d396d63d79880c233: no space left on device: unknown ctr: failed to ingest "blobs/sha256/0038afa1c30b6e7c6ed64ebbb3593756f0a5328da72cf3304be62b27cb40139a": failed to open writer: mkdir /var/lib/containerd/io.containerd.content.v1.content/ingest/f2e067fbba1183a1b7465f8eb36511a50660708c165d166d396d63d79880c233: no space left on device: unknown Loading docker image /mnt/k8sdisk//docker/etcd:3.5.21-0.tar failed! ctr: failed to ingest "blobs/sha256/0bab15eea81d0fe6ab56ebf5fba14e02c4c1775a7f7436fbddd3505add4e18fa": failed to open writer: mkdir /var/lib/containerd/io.containerd.content.v1.content/ingest/e32451d5bcc4edc020e31aecb569918a2e63aca4585fb9735e3e19ad02b818a0: no space left on device: unknown ctr: failed to ingest "blobs/sha256/0bab15eea81d0fe6ab56ebf5fba14e02c4c1775a7f7436fbddd3505add4e18fa": failed to open writer: mkdir /var/lib/containerd/io.containerd.content.v1.content/ingest/e32451d5bcc4edc020e31aecb569918a2e63aca4585fb9735e3e19ad02b818a0: no space left on device: unknown ctr: failed to ingest "blobs/sha256/0bab15eea81d0fe6ab56ebf5fba14e02c4c1775a7f7436fbddd3505add4e18fa": failed to open writer: mkdir /var/lib/containerd/io.containerd.content.v1.content/ingest/e32451d5bcc4edc020e31aecb569918a2e63aca4585fb9735e3e19ad02b818a0: no space left on device: unknown ``` 8. Stop the CKS cluster, in oder to remove the external node 9. CKS cluster goes into stop state , but the addition of the external node job still carries on ``` [root@ref-trl-9383-k-Mol8-kiran-chavala-mgmt1 ~]# tail -f /var/log/cloudstack/management/management-server.log |grep job-295 2025-09-04 10:07:11,732 DEBUG [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-52:[ctx-09df8212, job-295, ctx-993d72f7]) (logid:00b69bc8) Checking ready nodes for the Kubernetes cluster KubernetesCluster {"id":9,"name":"isolated","uuid":"8663f80d-3ce7-47f4-b852-4eb145453e0a"} with total 3 provisioned nodes 2025-09-04 10:07:12,529 DEBUG [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-52:[ctx-09df8212, job-295, ctx-993d72f7]) (logid:00b69bc8) Kubernetes cluster KubernetesCluster {"id":9,"name":"isolated","uuid":"8663f80d-3ce7-47f4-b852-4eb145453e0a"} has total 3 provisioned nodes while 2 ready now 2025-09-04 10:07:16,880 WARN [o.a.c.f.j.i.AsyncJobMonitor] (Timer-0:[ctx-2caef2bd]) (logid:299149e7) Task (job-295) has been pending for 2208 seconds 2025-09-04 10:07:27,530 DEBUG [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-52:[ctx-09df8212, job-295, ctx-993d72f7]) (logid:00b69bc8) Checking ready nodes for the Kubernetes cluster KubernetesCluster {"id":9,"name":"isolated","uuid":"8663f80d-3ce7-47f4-b852-4eb145453e0a"} with total 3 provisioned nodes 2025-09-04 10:07:28,209 DEBUG [c.c.k.c.u.KubernetesClusterUtil] (API-Job-Executor-52:[ctx-09df8212, job-295, ctx-993d72f7]) (logid:00b69bc8) Kubernetes cluster KubernetesCluster {"id":9,"name":"isolated","uuid":"8663f80d-3ce7-47f4-b852-4eb145453e0a"} has total 3 provisioned nodes while 2 ready now ``` 9. Destroy the external node and start the cks cluster Exception observed <img width="521" height="226" alt="Image" src="https://github.com/user-attachments/assets/658e4714-f933-4a36-a93f-97da589490e5" /> 10. The cks cluster remains in alert state 11. Destroy the cks cluster Exception <img width="1470" height="956" alt="Image" src="https://github.com/user-attachments/assets/bee7acb9-6f9c-4165-8ca1-935324b832d9" /> ``` 2025-09-04 11:01:47,260 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-64:[ctx-bc12e86e, job-333]) (logid:2e067630) Unexpected exception while executing org.apache.cloudstack.api.command.user.kubernetes.cluster.DeleteKubernetesClusterCmd java.lang.NullPointerException: Cannot invoke "com.cloud.vm.VMInstanceVO.getBackupOfferingId()" because "vm" is null at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.checkIfVmsAssociatedWithBackupOffering(KubernetesClusterManagerImpl.java:2010) at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterDestroyWorker.destroy(KubernetesClusterDestroyWorker.java:267) at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.destroyKubernetesCluster(KubernetesClusterManagerImpl.java:2384) at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.destroyKubernetesCluster(KubernetesClusterManagerImpl.java:2392) at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.deleteKubernetesCluster(KubernetesClusterManagerImpl.java:1969) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:109) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:52) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) at jdk.proxy3/jdk.proxy3.$Proxy534.deleteKubernetesCluster(Unknown Source) at org.apache.cloudstack.api.command.user.kubernetes.cluster.DeleteKubernetesClusterCmd.execute(DeleteKubernetesClusterCmd.java:95) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:173) at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:110) ```` ... ### What to do about it? Workaround Deploy the external ubuntu node with root disk size greater than 20 gb Need to fix the NPE as the cks cluster remains in alert state -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
