bradh352 commented on issue #11141: URL: https://github.com/apache/cloudstack/issues/11141#issuecomment-3568380154
@TadiosAbebe my environment matches yours .... ubuntu 24.04, hypercoverged ceph. And the reported behavior also matches. That said, using the force reconnect never did work for me as a resolution. And when a host got in that state, just restarting the agents wouldn't work for me and I'm not exactly sure why. I never did fully isolate it. Sometimes I had to restart the management servers which was really concerning, I think I tried restarting libvirtd too. I set up a cron job to restart the agent daily at 5 min intervals across hosts: https://github.com/bradh352/ansible-role-service-cloudstack/commit/fdb14d92f115a5845d95880a05f467b42a84aff1 This has helped considerably. That said, I just went through to "test" everything right now to see if everything was healthy, which entails trying to open the console on one vm per host, and I found a host in a bad state ... ugh. So restarting the agent daily is definitely not a resolution. Out of curiosity I decided to restart libvirtd, and that *did* fix it on the bad host. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
