[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

Lennert den Teuling (JIRA) Mon, 05 Aug 2013 05:12:57 -0700

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729442#comment-13729442
 ]


Lennert den Teuling edited comment on CLOUDSTACK-3535 at 8/5/13 12:10 PM:
--------------------------------------------------------------------------

This is the code that is responsible for nothing to happen 
(UserVmDomRInvestigator.java)

        if (s_logger.isDebugEnabled()) {
            s_logger.debug("could not reach agent, could not reach agent's 
host, returning that we don't have enough information");
        }
        return null;

I think because null is returned nothing happens, so i've replaced this simply 
with "Status.Down" and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent 
and an unpingable host not be enough to trigger HA? The only logical reason i 
could think of, is that when network issues occur ugly things could happen. But 
there still is the KVMHAChecker which uses the filesystem to check for 
heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with 
the KVMHAChecker, would this be enough to return "host.down" instead of "null" 
and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for 
now could this be a solution?
                
      was (Author: lennert):
    This is the code that is responsible for nothing to happen 
(UserVmDomRInvestigator.java)

        if (s_logger.isDebugEnabled()) {
            s_logger.debug("could not reach agent, could not reach agent's 
host, returning that we don't have enough information");
        }
        return null;

I think because null is returned nothing happens, I've replaced this simply 
with "Status.Down" and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent 
and an unpingable host not be enough to trigger HA? The only logical reason i 
could think of, is that when network issues occur ugly things could happen. But 
there still is the KVMHAChecker which uses the filesystem to check for 
heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with 
the KVMHAChecker, would this be enough to return "host.down" instead of "null" 
and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for 
now could this be a solution?
                  
> No HA actions are performed when a KVM host goes offline
> --------------------------------------------------------
>
>                 Key: CLOUDSTACK-3535
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Hypervisor Controller, KVM, Management Server
>    Affects Versions: 4.1.0, 4.1.1, 4.2.0
>         Environment: KVM (CentOS 6.3) with CloudStack 4.1
>            Reporter: Paul Angus
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: management-server.log.Agent
>
>
> If a KVM host 'goes down', CloudStack does not perform HA for instances which 
> are marked as HA enabled on that host (including system VMs)
> CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

Reply via email to