Does KVMInvestigator work on all shared primary storage, or just NFS?
I'm only familiar with the NFS KVMHA directories.

>From this it seems like a clean stop of the KVM agent still shouldn't
trigger any issues/HA, correct?

On Wed, Aug 7, 2013 at 2:28 PM, Edison Su <edison...@citrix.com> wrote:
> There is long time issue related to KVM HA, see bug: CLOUDSTACK-3535. 
> Basically, HA won't be triggered, if KVM agent is stopped either normally nor 
> abnormally, HA only be triggered if the network between mgt server and kvm 
> host is disconnected and the network between KVM hosts in the same cluster is 
> disconnected.
> Here is how the KVM HA works after the fix for CLOUDSTACK-3535:
> 1. If agent is stopped, agent will send a shutdown request to mgt server, mgt 
> server will mark the host as disconnected, while still maintain the host in 
> pingmap. Code is in AgentManagerImpl->AgentHandler-     >ProcessRequest-> 
> disconnectWithoutInvestigation
> 2. After ping.interval, mgt server will find the host is ping timeout, then 
> start HA investigation for the host. Code is in AgentMonitor->run-> 
> disconnectWithInvestigation
> 3. Mgt server will call all the available Investigators to investigate the 
> status of host.
>      The current investigators will be called for KVM host:
>         UserVmDomRInvestigator->isAgentAlive, will send PingTestCommand to 
> the host's neighbor. PingTestCommand will ping host's private ip address, if 
> ping is reachable, means host is up, otherwise, host's state is unknown. So 
> this investigator can only detect host is in up state.
>                 KVMInvestigator, which is newly added, will send a 
> CheckOnHostCommand to host's neighbor. CheckOnHostCommand will check the 
> heartbeat of host(heartbeat is stored on shared primary storage). Ideally, it 
> will detect host is down or up.
>
>      Combined with   UserVmDomRInvestigator  and KVMInvestigator, mgt server 
> should find out the status of host. But there is case, these two 
> investigators can report wrong status of host:
>           Host is in a network partition, while the KVM agent is down(thus 
> heartbeat is stopped)
> 4. After investigator reports status of host, if host is down, then start HA 
> for VMs created on the host.
>
>
> Improvement:
>      Per suggestion from Lennert den Teuling,  we'd better use IPMI to detect 
> host status, which is more reliable than ping and heartbeat, as IPMI has its 
> own network, less likely has network partition.
>
>

Reply via email to