Re: [I] Degraded cloudstack agent [cloudstack]

via GitHub Tue, 25 Nov 2025 22:30:57 -0800


TadiosAbebe commented on issue #11141:
URL: https://github.com/apache/cloudstack/issues/11141#issuecomment-3579480031


   > But I had a test all-in-one ACS on ubuntu 24.04 with libvirt 10.0.0, but I 
couldn’t reproduce the issue I’m seeing in the production environment. I 
repeatedly ran your test script:
   > 
   > ```
   > for i in `seq 1 20`;do
   >     cmk deploy virtualmachine name=L2-wei-test-$i serviceofferingid=xxx 
zoneid=xxx templateid=xxx networkids=xxx & >/dev/null;
   >     sleep 2;
   > done
   > ```
   > 
   > to generate load, and the results were consistently fast:
   > 
   > ```
   > mysql> select id,name,created,update_time,(update_time-created) from 
vm_instance where removed is null and name like "L2-wei%";
   > 
+-----+----------------+---------------------+---------------------+-----------------------+
   > | id  | name           | created             | update_time         | 
(update_time-created) |
   > 
+-----+----------------+---------------------+---------------------+-----------------------+
   > | 191 | L2-wei-test-1  | 2025-11-25 11:22:07 | 2025-11-25 11:22:14 |       
              7 |
   > | 192 | L2-wei-test-2  | 2025-11-25 11:22:09 | 2025-11-25 11:22:16 |       
              7 |
   > | 193 | L2-wei-test-3  | 2025-11-25 11:22:11 | 2025-11-25 11:22:17 |       
              6 |
   > | 194 | L2-wei-test-4  | 2025-11-25 11:22:13 | 2025-11-25 11:22:22 |       
              9 |
   > | 195 | L2-wei-test-5  | 2025-11-25 11:22:15 | 2025-11-25 11:22:20 |       
              5 |
   > | 196 | L2-wei-test-6  | 2025-11-25 11:22:17 | 2025-11-25 11:22:23 |       
              6 |
   > | 197 | L2-wei-test-7  | 2025-11-25 11:22:19 | 2025-11-25 11:22:26 |       
              7 |
   > | 198 | L2-wei-test-8  | 2025-11-25 11:22:21 | 2025-11-25 11:22:27 |       
              6 |
   > | 199 | L2-wei-test-9  | 2025-11-25 11:22:23 | 2025-11-25 11:22:29 |       
              6 |
   > | 200 | L2-wei-test-10 | 2025-11-25 11:22:25 | 2025-11-25 11:22:31 |       
              6 |
   > | 201 | L2-wei-test-11 | 2025-11-25 11:22:27 | 2025-11-25 11:22:34 |       
              7 |
   > | 202 | L2-wei-test-12 | 2025-11-25 11:22:29 | 2025-11-25 11:22:36 |       
              7 |
   > | 203 | L2-wei-test-13 | 2025-11-25 11:22:31 | 2025-11-25 11:22:38 |       
              7 |
   > | 204 | L2-wei-test-14 | 2025-11-25 11:22:33 | 2025-11-25 11:22:41 |       
              8 |
   > | 205 | L2-wei-test-15 | 2025-11-25 11:22:35 | 2025-11-25 11:22:42 |       
              7 |
   > | 206 | L2-wei-test-16 | 2025-11-25 11:22:37 | 2025-11-25 11:22:45 |       
              8 |
   > | 207 | L2-wei-test-17 | 2025-11-25 11:22:39 | 2025-11-25 11:22:48 |       
              9 |
   > | 208 | L2-wei-test-18 | 2025-11-25 11:22:41 | 2025-11-25 11:22:49 |       
              8 |
   > | 209 | L2-wei-test-19 | 2025-11-25 11:22:43 | 2025-11-25 11:22:51 |       
              8 |
   > | 210 | L2-wei-test-20 | 2025-11-25 11:22:45 | 2025-11-25 11:22:55 |       
             10 |
   > 
+-----+----------------+---------------------+---------------------+-----------------------+
   > ```
   > 
   > I'll try to create a full environment consisting ceph and multiple kvm 
host on a test environment to see if i can replicate the issue there and see if 
libvirt 10.6.0 fix it, later this week.
   
   My update:
   After yesterdays test i let the all-in-one ACS sit without restarting 
libvirtd and cloudstack-agent for a while now, initially the resource 
utilization of the java process was
   ```
   CPU:  0.5%
   MEM:  2.4%
   FD: 253
   Threads: 83
   Conn: 1
   ```
   After about 6 or 7 hours, got up to about CPU: 1.1% without any interaction 
on the host, just running the above 20 small cirros VMs previously launched. 
Then i tried to simulate some workload in a loop creating and destroying those 
20 instances. The resource CPU utilization increased to about 4.5% after about 
10 hours
   ```
   CPU:  4.5%
   MEM:  2.6%
   FD: 253
   Threads: 78
   Conn: 1
   ```
   One thing i noticed when looking into the java process in htop is, 5 or 6 
java process in our production cluster have a high CPU time for example on one 
of our compute host about 4 process have a CPU time value of around 53:54:30. 
Is this normal?
   
   If i restart libvirtd and the CPU time of the java process those process 
with high CPU time goes away and the max becomes  0:05:26 
   
   I'll replace the libvirt with 10.6.0 on the all-in-one instance and see what 
changes now.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Degraded cloudstack agent [cloudstack]

Reply via email to