[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2014-04-10 Thread Sudha Ponnaganti (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965421#comment-13965421
 ] 

Sudha Ponnaganti edited comment on CLOUDSTACK-3535 at 4/10/14 3:04 PM:
---

Paul can you confirm that this is actually working  for you


was (Author: sudhap):
Paul can you confirm that this is actually happening  for you

 No HA actions are performed when a KVM host goes offline
 

 Key: CLOUDSTACK-3535
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
  Components: Hypervisor Controller, KVM, Management Server
Affects Versions: 4.1.0, 4.1.1, 4.2.0
 Environment: KVM (CentOS 6.3) with CloudStack 4.1
Reporter: Paul Angus
Assignee: edison su
Priority: Blocker
 Fix For: 4.2.0

 Attachments: KVM-HA-4.1.1.2013-08-09-v1.patch, 
 extract-management-server.log.2013-08-09, management-server.log.Agent


 If a KVM host 'goes down', CloudStack does not perform HA for instances which 
 are marked as HA enabled on that host (including system VMs)
 CloudStack does not show the host as disconnected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2014-04-10 Thread Sudha Ponnaganti (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13965421#comment-13965421
 ] 

Sudha Ponnaganti edited comment on CLOUDSTACK-3535 at 4/10/14 3:04 PM:
---

Paul can you confirm that this is actually happening  for you


was (Author: sudhap):
Paul can you confirm that this is actually happening 

 No HA actions are performed when a KVM host goes offline
 

 Key: CLOUDSTACK-3535
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
  Components: Hypervisor Controller, KVM, Management Server
Affects Versions: 4.1.0, 4.1.1, 4.2.0
 Environment: KVM (CentOS 6.3) with CloudStack 4.1
Reporter: Paul Angus
Assignee: edison su
Priority: Blocker
 Fix For: 4.2.0

 Attachments: KVM-HA-4.1.1.2013-08-09-v1.patch, 
 extract-management-server.log.2013-08-09, management-server.log.Agent


 If a KVM host 'goes down', CloudStack does not perform HA for instances which 
 are marked as HA enabled on that host (including system VMs)
 CloudStack does not show the host as disconnected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2013-08-05 Thread Lennert den Teuling (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729442#comment-13729442
 ] 

Lennert den Teuling edited comment on CLOUDSTACK-3535 at 8/5/13 12:10 PM:
--

This is the code that is responsible for nothing to happen 
(UserVmDomRInvestigator.java)

if (s_logger.isDebugEnabled()) {
s_logger.debug(could not reach agent, could not reach agent's 
host, returning that we don't have enough information);
}
return null;

I think because null is returned nothing happens, so i've replaced this simply 
with Status.Down and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent 
and an unpingable host not be enough to trigger HA? The only logical reason i 
could think of, is that when network issues occur ugly things could happen. But 
there still is the KVMHAChecker which uses the filesystem to check for 
heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with 
the KVMHAChecker, would this be enough to return host.down instead of null 
and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for 
now could this be a solution?

  was (Author: lennert):
This is the code that is responsible for nothing to happen 
(UserVmDomRInvestigator.java)

if (s_logger.isDebugEnabled()) {
s_logger.debug(could not reach agent, could not reach agent's 
host, returning that we don't have enough information);
}
return null;

I think because null is returned nothing happens, I've replaced this simply 
with Status.Down and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent 
and an unpingable host not be enough to trigger HA? The only logical reason i 
could think of, is that when network issues occur ugly things could happen. But 
there still is the KVMHAChecker which uses the filesystem to check for 
heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with 
the KVMHAChecker, would this be enough to return host.down instead of null 
and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for 
now could this be a solution?
  
 No HA actions are performed when a KVM host goes offline
 

 Key: CLOUDSTACK-3535
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
  Components: Hypervisor Controller, KVM, Management Server
Affects Versions: 4.1.0, 4.1.1, 4.2.0
 Environment: KVM (CentOS 6.3) with CloudStack 4.1
Reporter: Paul Angus
Priority: Blocker
 Fix For: 4.2.0

 Attachments: management-server.log.Agent


 If a KVM host 'goes down', CloudStack does not perform HA for instances which 
 are marked as HA enabled on that host (including system VMs)
 CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2013-08-05 Thread Lennert den Teuling (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729442#comment-13729442
 ] 

Lennert den Teuling edited comment on CLOUDSTACK-3535 at 8/5/13 12:32 PM:
--

This is the code that is responsible for nothing to happen 
(UserVmDomRInvestigator.java)

if (s_logger.isDebugEnabled()) {
s_logger.debug(could not reach agent, could not reach agent's 
host, returning that we don't have enough information);
}
return null;

Because null is returned nothing happens, so i've replaced this simply with 
Status.Down and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent 
and an unpingable host not be enough to trigger HA? The only logical reason i 
could think of, is that when network issues occur ugly things could happen. But 
there still is the KVMHAChecker which uses the filesystem to check for 
heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with 
the KVMHAChecker, would this be enough to return host.down instead of null 
and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for 
now could this be a solution?

  was (Author: lennert):
This is the code that is responsible for nothing to happen 
(UserVmDomRInvestigator.java)

if (s_logger.isDebugEnabled()) {
s_logger.debug(could not reach agent, could not reach agent's 
host, returning that we don't have enough information);
}
return null;

I think because null is returned nothing happens, so i've replaced this simply 
with Status.Down and the HA works fine.

Maybe I'm looking at this issue to simple, but why would a unreachable agent 
and an unpingable host not be enough to trigger HA? The only logical reason i 
could think of, is that when network issues occur ugly things could happen. But 
there still is the KVMHAChecker which uses the filesystem to check for 
heartbeat of the node. 

So if you would combine the output of the UserVmDomRInvestigator together with 
the KVMHAChecker, would this be enough to return host.down instead of null 
and fix this issue? 

Ideally you would turn of the host trough IPMI to make sure it's dead, but for 
now could this be a solution?
  
 No HA actions are performed when a KVM host goes offline
 

 Key: CLOUDSTACK-3535
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
  Components: Hypervisor Controller, KVM, Management Server
Affects Versions: 4.1.0, 4.1.1, 4.2.0
 Environment: KVM (CentOS 6.3) with CloudStack 4.1
Reporter: Paul Angus
Priority: Blocker
 Fix For: 4.2.0

 Attachments: management-server.log.Agent


 If a KVM host 'goes down', CloudStack does not perform HA for instances which 
 are marked as HA enabled on that host (including system VMs)
 CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2013-07-30 Thread Salvatore Sciacco (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723512#comment-13723512
 ] 

Salvatore Sciacco edited comment on CLOUDSTACK-3535 at 7/30/13 7:41 AM:


For shared storage using the CLVM corosync/cman is required, can't we 
use/require it for HA? It has quorum/fencing/etc...

BTW in any case some workaround/hack to release the VM from the died host is 
required, some suggestion?

  was (Author: sas2000):
For shared storage using the CLVM corosync/cman is required, can't we 
use/require it for HA? It has quorum/fencing/etc...
  
 No HA actions are performed when a KVM host goes offline
 

 Key: CLOUDSTACK-3535
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
  Components: Hypervisor Controller, KVM, Management Server
Affects Versions: 4.1.0, 4.1.1, 4.2.0
 Environment: KVM (CentOS 6.3) with CloudStack 4.1
Reporter: Paul Angus
Assignee: Paul Angus
Priority: Blocker
 Attachments: management-server.log.Agent


 If a KVM host 'goes down', CloudStack does not perform HA for instances which 
 are marked as HA enabled on that host (including system VMs)
 CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2013-07-24 Thread Marcus Sorensen (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13719176#comment-13719176
 ] 

Marcus Sorensen edited comment on CLOUDSTACK-3535 at 7/25/13 2:49 AM:
--

Sounds like this is not KVM specific.

Not to be blunt, but I don't think Logan's solution works, at all.  We have no 
way of knowing what's running on a host or not, simply by whether or not we can 
ping it on the management network. A host may be running with 20 VMs, all 
healthy, but the management nic went out on the host. Relying on ping presents 
too many assumptions (Storage is ethernet based, and the same interface/network 
is serving both management and storage).

The only way to go is with proper fencing. For those storage types that support 
it, revoke access to other hosts when a VM starts, so that even if it was 
running elsewhere, you basically pull the power cord when you start up the VM 
in the known good location. Meaning that a host starting a VM has an exclusive 
lock on the volumes associated with the VM. Additionally/alternatively, an IPMI 
service that will power off a host if the agent isn't in maintenance mode and 
is non-communicative.

In the mean time, like the short term solution mentions, if we can put the host 
into maintenance mode manually when it's known-down, and allow vms to migrate, 
that would at least allow people to get their system working again without DB 
hacks.

  was (Author: mlsorensen):
Sounds like this is not KVM specific.

Not to be blunt, but I don't think Logan's solution works, at all.  We have no 
way of knowing what's running on a host or not, simply by whether or not we can 
ping it on the management network. A host may be running with 20 VMs, all 
healthy, but the management nic went out on the host. Relying on ping presents 
too many assumptions (Storage is ethernet based, and the same interface/network 
is serving both management and storage).

The only way to go is with proper fencing. For those storage types that support 
it, revoke access to other hosts when a VM starts, so that even if it was 
running elsewhere, you basically pull the power cord when you start up the VM 
in the known good location. Additionally/alternatively, an IPMI service that 
will power off a host if the agent isn't in maintenance mode and is 
non-communicative.

In the mean time, like the short term solution mentions, if we can put the host 
into maintenance mode manually when it's known-down, and allow vms to migrate, 
that would at least allow people to get their system working again without DB 
hacks.
  
 No HA actions are performed when a KVM host goes offline
 

 Key: CLOUDSTACK-3535
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
  Components: Hypervisor Controller, KVM, Management Server
Affects Versions: 4.1.0, 4.1.1, 4.2.0
 Environment: KVM (CentOS 6.3) with CloudStack 4.1
Reporter: Paul Angus
Priority: Blocker

 If a KVM host 'goes down', CloudStack does not perform HA for instances which 
 are marked as HA enabled on that host (including system VMs)
 CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira