akoskuczi-bw opened a new issue, #13471:
URL: https://github.com/apache/cloudstack/issues/13471

   ### problem
   
   <p class="font-claude-response-body break-words whitespace-normal" 
data-sourcepos="39:1-43:55;1334-1737">When creating a CKS Kubernetes cluster 
and <strong>explicitly selecting a non-default node
   template</strong> (the stock <code class="bg-text-200/5 border border-0.5 
border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 
py-px text-[0.9rem]">cks-ubuntu-2204-kvm</code> CKS-ready image) in Advanced 
Settings, the
   node <strong>Instance</strong> stays in the <code class="bg-text-200/5 
border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap 
rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code> <strong>VM lifecycle 
state</strong> indefinitely and never
   transitions to <code class="bg-text-200/5 border border-0.5 
border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 
py-px text-[0.9rem]">Running</code> — even though the libvirt domain is 
actually up on the KVM host
   (the VNC console is reachable and the guest OS boots).</p>
   
   <p class="font-claude-response-body break-words whitespace-normal" 
data-sourcepos="45:1-49:52;1739-2127">The same cluster creation 
<strong>succeeds</strong> when the <strong>default SystemVM template</strong>
   (<code class="bg-text-200/5 border border-0.5 border-border-300 
text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">systemvm-kvm-4.22.0-x86_64</code>) is used for the nodes. The 
only changed variable between
   the working and failing cases is the node template selection, which points 
at the
   <strong>Flexible Kubernetes Clusters per-node template selection 
path</strong> rather than at
   in-guest provisioning (cloud-init / kubeadm / CNI).</p>
   
   
   
   <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold" 
data-sourcepos="55:1-55:22;2375-2396">STEPS TO REPRODUCE</h2>
   
   <ol class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 
[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-decimal 
flex flex-col gap-1 pl-8 mb-3" data-sourcepos="57:1-68:24;2398-3079">
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="57:1-57:37;2398-2434">ACS 4.22.1.0 on KVM, CKS enabled.</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="58:1-59:21;2435-2538">Register the stock CKS-ready Ubuntu 22.04 
KVM template (<code class="bg-text-200/5 border border-0.5 border-border-300 
text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">cks-ubuntu-2204-kvm</code>),
   marked "For CKS".</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="60:1-60:57;2539-2595">Register a supported Kubernetes binaries 
ISO/version.</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="61:1-63:13;2596-2780">Create a CloudManaged Kubernetes cluster 
(3 control node HA) on a <strong>VPC tier</strong> network,
   and in <strong>Advanced Settings explicitly select <code 
class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 
whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">cks-ubuntu-2204-kvm</code></strong> as the node
   template.</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="64:1-65:27;2781-2886">Observe the control node Instance in the 
UI (Instances → the control node):
   it stays in <code class="bg-text-200/5 border border-0.5 border-border-300 
text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">Starting</code>.</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="66:1-68:24;2887-3079">Repeat the exact same cluster creation 
but <strong>do not select a template</strong> (use the
   default <code class="bg-text-200/5 border border-0.5 border-border-300 
text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">systemvm-kvm-4.22.0-x86_64</code>): the node reaches <code 
class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 
whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Running</code> 
and the cluster
   provisions normally.</li>
   </ol>
   
   <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold" 
data-sourcepos="70:1-70:20;3081-3100">EXPECTED RESULTS</h2>
   
   <p class="font-claude-response-body break-words whitespace-normal" 
data-sourcepos="72:1-75:11;3102-3367">Selecting a "For CKS" template in 
Advanced Settings should behave the same as the
   default template path: the node Instance transitions <code 
class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 
whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code> 
→ <code class="bg-text-200/5 border border-0.5 border-border-300 
text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">Running</code> within the
   normal VM start window, after which CKS provisioning proceeds and the 
cluster reaches
   <code class="bg-text-200/5 border border-0.5 border-border-300 
text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px 
text-[0.9rem]">Running</code>.</p>
   
   <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold" 
data-sourcepos="77:1-77:18;3369-3386">ACTUAL RESULTS</h2>
   
   <p class="font-claude-response-body break-words whitespace-normal" 
data-sourcepos="79:1-79:57;3388-3444">Behaviour depends solely on the node 
template selection:</p>
   
   <div class="overflow-x-auto w-full px-2 mb-6" 
data-sourcepos="81:1-85:106;3446-3800">
   
   Plain Instance deployed from cks-ubuntu-2204-kvm (no CKS) | Reaches Running 
— works
   CKS node from default SystemVM template systemvm-kvm-4.22.0-x86_64 | Reaches 
Running, cluster provisions — works
   CKS node from selected cks-ubuntu-2204-kvm template | Instance stuck in 
Starting indefinitely
   
   </div>
   
   <p class="font-claude-response-body break-words whitespace-normal" 
data-sourcepos="87:1-87:21;3802-3822">In the failing case:</p>
   
   <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 
[&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc 
flex flex-col gap-1 pl-8 mb-3" data-sourcepos="89:1-95:47;3824-4284">
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="89:1-90:40;3824-3951">The control node Instance remains in 
<code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 
whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">Starting</code> 
(VM lifecycle state) indefinitely, so
   the cluster also stays in <code class="bg-text-200/5 border border-0.5 
border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 
py-px text-[0.9rem]">Starting</code>.</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="91:1-92:26;3952-4065">The libvirt domain is genuinely running 
on the KVM host: the VNC console is reachable
   and the guest OS boots.</li>
   <li class="font-claude-response-body whitespace-normal break-words pl-2" 
data-sourcepos="93:1-95:47;4066-4284">SSH to the node (port 2222) is not 
usable; the UI-displayed password is not accepted
   on the VNC console (consistent with the node never being marked Running 
and/or the
   start/provisioning workflow not completing).</li></ul>
   
   ### versions
   
   Hypervisor version: KVM Ubuntu 24.04
   ACS version: 4.22.1.0
   Management server: Ubuntu 24.04
   
   ### The steps to reproduce the bug
   
   _No response_
   
   ### What to do about it?
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to