vgarcia-linube commented on issue #11141: URL: https://github.com/apache/cloudstack/issues/11141#issuecomment-3292786693
The problem seems to be a leak in the handler threads while checking storage usage. The more agent threads you configure in agent.properties and the less time you configure to retrieve volume usage metrics, the worse it gets, and the faster it happens. On a fresh start of the agent, you get the 'Trying to fetch storage pool xxxx from libvirt' message whenever the usage service is getting updated metrics. Those requests are either leaking or not getting garbage collected or something like that in time. Those requests start to overlap with time, and you end up seeing the same request to the same primary storage tens or hundreds of times. The only way to recover from that is to restart the agent, limit the number of threads of the agent and try to read the usage metrics in longer time spans (I think it defaults to 10 minutes or something like that, setting it to once every two hours mitigates it a bit, just enough so you don't have to restart the agent every few hours so it doesn't hog the kvm node cpu). Here's a log with redacted storage uuids so it's easier to see (take a look at the timestamps) [storage-log-no-uuids.log](https://github.com/user-attachments/files/22346459/storage-log-no-uuids.log) This happens at least since Cloudstack 4.19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
