Re: [I] Delay in determining a KVM host is down [cloudstack]

via GitHub Tue, 25 Mar 2025 22:00:10 -0700


rajujith commented on issue #10477:
URL: https://github.com/apache/cloudstack/issues/10477#issuecomment-2753247589


   @alsko-icom  With the latest nightly build, without making any changes, the 
VM HA starts about 5 minutes after a KVM host crashed. I simulated the host 
crash by powering off the nested KVM I used. 
   
   If you want to reduce it further, you can set the global configuration 
'commands.wait' value to a suitable value like 
"CheckHealthCommand=5,CheckOnHostCommand=5". I saw VM HA triggered after a host 
crash in 2 Minutes and 30 seconds. 
   
   Once a host crashes CloudStack should identify that the host is unreachable, 
that is determined by 'ping.interval * ping.timeout' [1] So reducing the value 
( use an appropriate value based on your testing) will make the host crash 
detection faster. Then it starts multiple investigations where it uses commands 
like 'CheckHealthCommand' , CheckOnHostCommand and a few more. you can view the 
sample log file in the issue description. 
   
   Updating ping.interval,ping.timeout requires a cloudstack-management service 
restart. 
   
   I didn't verify anything on the host HA, I tried only the VM HA. 
   
   [1] 
https://cwiki.apache.org/confluence/display/CLOUDSTACK/High+Availability+Developer's+Guide
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Delay in determining a KVM host is down [cloudstack]

Reply via email to