I'm not sure we can rely on IPMI to tell us much about the host status
itself. It's easy to use it for checking on basic poweron/poweroff,
temperature, etc, but not so easy to tell if something is wrong with
the OS, config, or at the software level.

However, I did mention support in that thread early on for sending an
IPMI poweroff for hosts that cloudstack has determined are down and
starts migrating vms for, as a safety precaution.

On Wed, Aug 7, 2013 at 2:41 PM, Marcus Sorensen <shadow...@gmail.com> wrote:
> Does KVMInvestigator work on all shared primary storage, or just NFS?
> I'm only familiar with the NFS KVMHA directories.
>
> From this it seems like a clean stop of the KVM agent still shouldn't
> trigger any issues/HA, correct?
>
> On Wed, Aug 7, 2013 at 2:28 PM, Edison Su <edison...@citrix.com> wrote:
>> There is long time issue related to KVM HA, see bug: CLOUDSTACK-3535. 
>> Basically, HA won't be triggered, if KVM agent is stopped either normally 
>> nor abnormally, HA only be triggered if the network between mgt server and 
>> kvm host is disconnected and the network between KVM hosts in the same 
>> cluster is disconnected.
>> Here is how the KVM HA works after the fix for CLOUDSTACK-3535:
>> 1. If agent is stopped, agent will send a shutdown request to mgt server, 
>> mgt server will mark the host as disconnected, while still maintain the host 
>> in pingmap. Code is in AgentManagerImpl->AgentHandler-     >ProcessRequest-> 
>> disconnectWithoutInvestigation
>> 2. After ping.interval, mgt server will find the host is ping timeout, then 
>> start HA investigation for the host. Code is in AgentMonitor->run-> 
>> disconnectWithInvestigation
>> 3. Mgt server will call all the available Investigators to investigate the 
>> status of host.
>>      The current investigators will be called for KVM host:
>>         UserVmDomRInvestigator->isAgentAlive, will send PingTestCommand to 
>> the host's neighbor. PingTestCommand will ping host's private ip address, if 
>> ping is reachable, means host is up, otherwise, host's state is unknown. So 
>> this investigator can only detect host is in up state.
>>                 KVMInvestigator, which is newly added, will send a 
>> CheckOnHostCommand to host's neighbor. CheckOnHostCommand will check the 
>> heartbeat of host(heartbeat is stored on shared primary storage). Ideally, 
>> it will detect host is down or up.
>>
>>      Combined with   UserVmDomRInvestigator  and KVMInvestigator, mgt server 
>> should find out the status of host. But there is case, these two 
>> investigators can report wrong status of host:
>>           Host is in a network partition, while the KVM agent is down(thus 
>> heartbeat is stopped)
>> 4. After investigator reports status of host, if host is down, then start HA 
>> for VMs created on the host.
>>
>>
>> Improvement:
>>      Per suggestion from Lennert den Teuling,  we'd better use IPMI to 
>> detect host status, which is more reliable than ping and heartbeat, as IPMI 
>> has its own network, less likely has network partition.
>>
>>

Reply via email to