akoskuczi-bw opened a new issue, #13471: URL: https://github.com/apache/cloudstack/issues/13471
### problem <p class="font-claude-response-body break-words whitespace-normal" data-sourcepos="39:1-43:55;1334-1737">When creating a CKS Kubernetes cluster and <strong>explicitly selecting a non-default node template</strong> (the stock <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">cks-ubuntu-2204-kvm</code> CKS-ready image) in Advanced Settings, the node <strong>Instance</strong> stays in the <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code> <strong>VM lifecycle state</strong> indefinitely and never transitions to <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Running</code> — even though the libvirt domain is actually up on the KVM host (the VNC console is reachable and the guest OS boots).</p> <p class="font-claude-response-body break-words whitespace-normal" data-sourcepos="45:1-49:52;1739-2127">The same cluster creation <strong>succeeds</strong> when the <strong>default SystemVM template</strong> (<code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">systemvm-kvm-4.22.0-x86_64</code>) is used for the nodes. The only changed variable between the working and failing cases is the node template selection, which points at the <strong>Flexible Kubernetes Clusters per-node template selection path</strong> rather than at in-guest provisioning (cloud-init / kubeadm / CNI).</p> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold" data-sourcepos="55:1-55:22;2375-2396">STEPS TO REPRODUCE</h2> <ol class="[li_&]:mb-0 [li_&]:mt-1 [li_&]:gap-1 [&:not(:last-child)_ul]:pb-1 [&:not(:last-child)_ol]:pb-1 list-decimal flex flex-col gap-1 pl-8 mb-3" data-sourcepos="57:1-68:24;2398-3079"> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="57:1-57:37;2398-2434">ACS 4.22.1.0 on KVM, CKS enabled.</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="58:1-59:21;2435-2538">Register the stock CKS-ready Ubuntu 22.04 KVM template (<code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">cks-ubuntu-2204-kvm</code>), marked "For CKS".</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="60:1-60:57;2539-2595">Register a supported Kubernetes binaries ISO/version.</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="61:1-63:13;2596-2780">Create a CloudManaged Kubernetes cluster (3 control node HA) on a <strong>VPC tier</strong> network, and in <strong>Advanced Settings explicitly select <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">cks-ubuntu-2204-kvm</code></strong> as the node template.</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="64:1-65:27;2781-2886">Observe the control node Instance in the UI (Instances → the control node): it stays in <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code>.</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="66:1-68:24;2887-3079">Repeat the exact same cluster creation but <strong>do not select a template</strong> (use the default <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">systemvm-kvm-4.22.0-x86_64</code>): the node reaches <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Running</code> and the cluster provisions normally.</li> </ol> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold" data-sourcepos="70:1-70:20;3081-3100">EXPECTED RESULTS</h2> <p class="font-claude-response-body break-words whitespace-normal" data-sourcepos="72:1-75:11;3102-3367">Selecting a "For CKS" template in Advanced Settings should behave the same as the default template path: the node Instance transitions <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code> → <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Running</code> within the normal VM start window, after which CKS provisioning proceeds and the cluster reaches <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Running</code>.</p> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold" data-sourcepos="77:1-77:18;3369-3386">ACTUAL RESULTS</h2> <p class="font-claude-response-body break-words whitespace-normal" data-sourcepos="79:1-79:57;3388-3444">Behaviour depends solely on the node template selection:</p> <div class="overflow-x-auto w-full px-2 mb-6" data-sourcepos="81:1-85:106;3446-3800"> Plain Instance deployed from cks-ubuntu-2204-kvm (no CKS) | Reaches Running — works CKS node from default SystemVM template systemvm-kvm-4.22.0-x86_64 | Reaches Running, cluster provisions — works CKS node from selected cks-ubuntu-2204-kvm template | Instance stuck in Starting indefinitely </div> <p class="font-claude-response-body break-words whitespace-normal" data-sourcepos="87:1-87:21;3802-3822">In the failing case:</p> <ul class="[li_&]:mb-0 [li_&]:mt-1 [li_&]:gap-1 [&:not(:last-child)_ul]:pb-1 [&:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3" data-sourcepos="89:1-95:47;3824-4284"> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="89:1-90:40;3824-3951">The control node Instance remains in <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code> (VM lifecycle state) indefinitely, so the cluster also stays in <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code>.</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="91:1-92:26;3952-4065">The libvirt domain is genuinely running on the KVM host: the VNC console is reachable and the guest OS boots.</li> <li class="font-claude-response-body whitespace-normal break-words pl-2" data-sourcepos="93:1-95:47;4066-4284">SSH to the node (port 2222) is not usable; the UI-displayed password is not accepted on the VNC console (consistent with the node never being marked Running and/or the start/provisioning workflow not completing).</li></ul> ### versions Hypervisor version: KVM Ubuntu 24.04 ACS version: 4.22.1.0 Management server: Ubuntu 24.04 ### The steps to reproduce the bug _No response_ ### What to do about it? _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
