[jira] [Commented] (CLOUDSTACK-8713) HA is not working on CentOS 7 due to KVM Power state report not properly parsed (Exception)

2015-08-26 Thread Daan Hoogland (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715530#comment-14715530
 ] 

Daan Hoogland commented on CLOUDSTACK-8713:
---

The problem as described by Remi can not be reproduced. HA has some strange 
qualities, however.
In a setup as described;
- When a VM is stopped on the hypervisor, ha seems to work fine.
- When a kvm host is kick from under us (power off or network unplug) ha seems 
to work fine.
- When shutdown is performed on the host or the agent is stopped with service 
stop, ha doesn't work.

It seems to be a problem with the state machine handling host states. A host in 
Disconnected or Alert state does not recover nor do its VMs.

 HA is not working on CentOS 7 due to KVM Power state report not properly 
 parsed (Exception)
 ---

 Key: CLOUDSTACK-8713
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8713
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
Affects Versions: 4.6.0
 Environment: KVM on CentOS 7, management server running latest master 
 aka 4.6.0
Reporter: Remi Bergsma
Priority: Critical

 While testing a PR, I found that HA on KVM does not work properly. 
 Steps to reproduce:
 - Spin up some VMs on KVM using a HA offering
 - go to KVM hypervisor and kill one of them to simulate a crash
 virsh destroy 6 (change number)
 - look how cloudstack handles this missing VM
 Result:
 - VM stays down and is not started
 Expected result:
 - VM should be started somewhere
 Cause:
 It doesn’t parse the power report property it gets from the hypervisor, so it 
 never marks it Stopped. HA will not start, VM will stay down.
 Database reports PowerStateMissing. Starting manually works fine.
 select name,power_state,instance_name,state from vm_instance where 
 name='test003';
 | name| power_state| instance_name | state   |
 | test003 | PowerReportMissing | i-2-6-VM  | Running |
 1 row in set (0.00 sec)
 I also tried to crash a KVM hypervisor and then the same thing happens.
 Haven’t tested it on other hypervisors. Could anyone verify this?
 Logs:
 2015-08-06 15:40:46,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
 (AgentManager-Handler-16:null) VM state report is updated. host: 1, vm id: 6, 
 power state: PowerReportMissing 
 2015-08-06 15:40:46,815 INFO  [c.c.v.VirtualMachineManagerImpl] 
 (AgentManager-Handler-16:null) VM i-2-6-VM is at Running and we received a 
 power-off report while there is no pending jobs on it
 2015-08-06 15:40:46,815 INFO  [c.c.v.VirtualMachineManagerImpl] 
 (AgentManager-Handler-16:null) Detected out-of-band stop of a HA enabled VM 
 i-2-6-VM, will schedule restart
 2015-08-06 15:40:46,824 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (AgentManager-Handler-16:null) Schedule vm for HA:  VM[User|i-2-6-VM]
 2015-08-06 15:40:46,824 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
 (AgentManager-Handler-16:null) Done with process of VM state report. host: 1
 2015-08-06 15:40:46,851 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (HA-Worker-3:ctx-4e073b92 work-37) Processing 
 HAWork[37-HA-6-Running-Investigating]
 2015-08-06 15:40:46,871 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (HA-Worker-3:ctx-4e073b92 work-37) HA on VM[User|i-2-6-VM]
 2015-08-06 15:40:46,880 DEBUG [c.c.a.t.Request] (HA-Worker-3:ctx-4e073b92 
 work-37) Seq 1-6463228415230083145: Sending  { Cmd , MgmtId: 3232241215, via: 
 1(kvm2), Ver: v1, Flags: 100011, 
 [{com.cloud.agent.api.CheckVirtualMachineCommand:{vmName:i-2-6-VM,wait:20}}]
  }
 2015-08-06 15:40:46,908 ERROR [c.c.a.t.Request] 
 (AgentManager-Handler-17:null) Unable to convert to json: 
 [{com.cloud.agent.api.CheckVirtualMachineAnswer:{state:Stopped,result:true,contextMap:{},wait:0}}]
 2015-08-06 15:40:46,909 WARN  [c.c.u.n.Task] (AgentManager-Handler-17:null) 
 Caught the following exception but pushing on
 com.google.gson.JsonParseException: The JsonDeserializer EnumTypeAdapter 
 failed to deserialize json object Stopped given the type class 
 com.cloud.vm.VirtualMachine$PowerState
 at 
 com.google.gson.JsonDeserializerExceptionWrapper.deserialize(JsonDeserializerExceptionWrapper.java:64)
 at 
 com.google.gson.JsonDeserializationVisitor.invokeCustomDeserializer(JsonDeserializationVisitor.java:92)
 at 
 com.google.gson.JsonObjectDeserializationVisitor.visitFieldUsingCustomHandler(JsonObjectDeserializationVisitor.java:117)
 at 
 com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:63)
 at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
 at 
 

[jira] [Commented] (CLOUDSTACK-8713) HA is not working on CentOS 7 due to KVM Power state report not properly parsed (Exception)

2015-08-26 Thread Daan Hoogland (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712991#comment-14712991
 ] 

Daan Hoogland commented on CLOUDSTACK-8713:
---

seems like the enum we use;
public enum PowerState {
PowerUnknown,
PowerOn,
PowerOff,
PowerReportMissing
}
is no longer valid for the new versions of the hypervisor. The value 'Stopped' 
has been added or is replacing one, probably 'PowerOff'.

 HA is not working on CentOS 7 due to KVM Power state report not properly 
 parsed (Exception)
 ---

 Key: CLOUDSTACK-8713
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8713
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
Affects Versions: 4.6.0
 Environment: KVM on CentOS 7, management server running latest master 
 aka 4.6.0
Reporter: Remi Bergsma
Priority: Critical

 While testing a PR, I found that HA on KVM does not work properly. 
 Steps to reproduce:
 - Spin up some VMs on KVM using a HA offering
 - go to KVM hypervisor and kill one of them to simulate a crash
 virsh destroy 6 (change number)
 - look how cloudstack handles this missing VM
 Result:
 - VM stays down and is not started
 Expected result:
 - VM should be started somewhere
 Cause:
 It doesn’t parse the power report property it gets from the hypervisor, so it 
 never marks it Stopped. HA will not start, VM will stay down.
 Database reports PowerStateMissing. Starting manually works fine.
 select name,power_state,instance_name,state from vm_instance where 
 name='test003';
 | name| power_state| instance_name | state   |
 | test003 | PowerReportMissing | i-2-6-VM  | Running |
 1 row in set (0.00 sec)
 I also tried to crash a KVM hypervisor and then the same thing happens.
 Haven’t tested it on other hypervisors. Could anyone verify this?
 Logs:
 2015-08-06 15:40:46,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
 (AgentManager-Handler-16:null) VM state report is updated. host: 1, vm id: 6, 
 power state: PowerReportMissing 
 2015-08-06 15:40:46,815 INFO  [c.c.v.VirtualMachineManagerImpl] 
 (AgentManager-Handler-16:null) VM i-2-6-VM is at Running and we received a 
 power-off report while there is no pending jobs on it
 2015-08-06 15:40:46,815 INFO  [c.c.v.VirtualMachineManagerImpl] 
 (AgentManager-Handler-16:null) Detected out-of-band stop of a HA enabled VM 
 i-2-6-VM, will schedule restart
 2015-08-06 15:40:46,824 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (AgentManager-Handler-16:null) Schedule vm for HA:  VM[User|i-2-6-VM]
 2015-08-06 15:40:46,824 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
 (AgentManager-Handler-16:null) Done with process of VM state report. host: 1
 2015-08-06 15:40:46,851 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (HA-Worker-3:ctx-4e073b92 work-37) Processing 
 HAWork[37-HA-6-Running-Investigating]
 2015-08-06 15:40:46,871 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (HA-Worker-3:ctx-4e073b92 work-37) HA on VM[User|i-2-6-VM]
 2015-08-06 15:40:46,880 DEBUG [c.c.a.t.Request] (HA-Worker-3:ctx-4e073b92 
 work-37) Seq 1-6463228415230083145: Sending  { Cmd , MgmtId: 3232241215, via: 
 1(kvm2), Ver: v1, Flags: 100011, 
 [{com.cloud.agent.api.CheckVirtualMachineCommand:{vmName:i-2-6-VM,wait:20}}]
  }
 2015-08-06 15:40:46,908 ERROR [c.c.a.t.Request] 
 (AgentManager-Handler-17:null) Unable to convert to json: 
 [{com.cloud.agent.api.CheckVirtualMachineAnswer:{state:Stopped,result:true,contextMap:{},wait:0}}]
 2015-08-06 15:40:46,909 WARN  [c.c.u.n.Task] (AgentManager-Handler-17:null) 
 Caught the following exception but pushing on
 com.google.gson.JsonParseException: The JsonDeserializer EnumTypeAdapter 
 failed to deserialize json object Stopped given the type class 
 com.cloud.vm.VirtualMachine$PowerState
 at 
 com.google.gson.JsonDeserializerExceptionWrapper.deserialize(JsonDeserializerExceptionWrapper.java:64)
 at 
 com.google.gson.JsonDeserializationVisitor.invokeCustomDeserializer(JsonDeserializationVisitor.java:92)
 at 
 com.google.gson.JsonObjectDeserializationVisitor.visitFieldUsingCustomHandler(JsonObjectDeserializationVisitor.java:117)
 at 
 com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:63)
 at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
 at 
 com.google.gson.JsonDeserializationContextDefault.fromJsonObject(JsonDeserializationContextDefault.java:76)
 at 
 com.google.gson.JsonDeserializationContextDefault.deserialize(JsonDeserializationContextDefault.java:54)
 at com.google.gson.Gson.fromJson(Gson.java:551)
 at 

[jira] [Commented] (CLOUDSTACK-8713) HA is not working on CentOS 7 due to KVM Power state report not properly parsed (Exception)

2015-08-26 Thread Remi Bergsma (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715727#comment-14715727
 ] 

Remi Bergsma commented on CLOUDSTACK-8713:
--

Not sure how I got into this state. If I deploy a new cloud based on latest 
master, I'm also not able to reproduce.

 HA is not working on CentOS 7 due to KVM Power state report not properly 
 parsed (Exception)
 ---

 Key: CLOUDSTACK-8713
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8713
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public(Anyone can view this level - this is the 
 default.) 
Affects Versions: 4.6.0
 Environment: KVM on CentOS 7, management server running latest master 
 aka 4.6.0
Reporter: Remi Bergsma
Priority: Critical

 While testing a PR, I found that HA on KVM does not work properly. 
 Steps to reproduce:
 - Spin up some VMs on KVM using a HA offering
 - go to KVM hypervisor and kill one of them to simulate a crash
 virsh destroy 6 (change number)
 - look how cloudstack handles this missing VM
 Result:
 - VM stays down and is not started
 Expected result:
 - VM should be started somewhere
 Cause:
 It doesn’t parse the power report property it gets from the hypervisor, so it 
 never marks it Stopped. HA will not start, VM will stay down.
 Database reports PowerStateMissing. Starting manually works fine.
 select name,power_state,instance_name,state from vm_instance where 
 name='test003';
 | name| power_state| instance_name | state   |
 | test003 | PowerReportMissing | i-2-6-VM  | Running |
 1 row in set (0.00 sec)
 I also tried to crash a KVM hypervisor and then the same thing happens.
 Haven’t tested it on other hypervisors. Could anyone verify this?
 Logs:
 2015-08-06 15:40:46,809 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
 (AgentManager-Handler-16:null) VM state report is updated. host: 1, vm id: 6, 
 power state: PowerReportMissing 
 2015-08-06 15:40:46,815 INFO  [c.c.v.VirtualMachineManagerImpl] 
 (AgentManager-Handler-16:null) VM i-2-6-VM is at Running and we received a 
 power-off report while there is no pending jobs on it
 2015-08-06 15:40:46,815 INFO  [c.c.v.VirtualMachineManagerImpl] 
 (AgentManager-Handler-16:null) Detected out-of-band stop of a HA enabled VM 
 i-2-6-VM, will schedule restart
 2015-08-06 15:40:46,824 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (AgentManager-Handler-16:null) Schedule vm for HA:  VM[User|i-2-6-VM]
 2015-08-06 15:40:46,824 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl] 
 (AgentManager-Handler-16:null) Done with process of VM state report. host: 1
 2015-08-06 15:40:46,851 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (HA-Worker-3:ctx-4e073b92 work-37) Processing 
 HAWork[37-HA-6-Running-Investigating]
 2015-08-06 15:40:46,871 INFO  [c.c.h.HighAvailabilityManagerImpl] 
 (HA-Worker-3:ctx-4e073b92 work-37) HA on VM[User|i-2-6-VM]
 2015-08-06 15:40:46,880 DEBUG [c.c.a.t.Request] (HA-Worker-3:ctx-4e073b92 
 work-37) Seq 1-6463228415230083145: Sending  { Cmd , MgmtId: 3232241215, via: 
 1(kvm2), Ver: v1, Flags: 100011, 
 [{com.cloud.agent.api.CheckVirtualMachineCommand:{vmName:i-2-6-VM,wait:20}}]
  }
 2015-08-06 15:40:46,908 ERROR [c.c.a.t.Request] 
 (AgentManager-Handler-17:null) Unable to convert to json: 
 [{com.cloud.agent.api.CheckVirtualMachineAnswer:{state:Stopped,result:true,contextMap:{},wait:0}}]
 2015-08-06 15:40:46,909 WARN  [c.c.u.n.Task] (AgentManager-Handler-17:null) 
 Caught the following exception but pushing on
 com.google.gson.JsonParseException: The JsonDeserializer EnumTypeAdapter 
 failed to deserialize json object Stopped given the type class 
 com.cloud.vm.VirtualMachine$PowerState
 at 
 com.google.gson.JsonDeserializerExceptionWrapper.deserialize(JsonDeserializerExceptionWrapper.java:64)
 at 
 com.google.gson.JsonDeserializationVisitor.invokeCustomDeserializer(JsonDeserializationVisitor.java:92)
 at 
 com.google.gson.JsonObjectDeserializationVisitor.visitFieldUsingCustomHandler(JsonObjectDeserializationVisitor.java:117)
 at 
 com.google.gson.ReflectingFieldNavigator.visitFieldsReflectively(ReflectingFieldNavigator.java:63)
 at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:120)
 at 
 com.google.gson.JsonDeserializationContextDefault.fromJsonObject(JsonDeserializationContextDefault.java:76)
 at 
 com.google.gson.JsonDeserializationContextDefault.deserialize(JsonDeserializationContextDefault.java:54)
 at com.google.gson.Gson.fromJson(Gson.java:551)
 at com.google.gson.Gson.fromJson(Gson.java:521)
 at 
 com.cloud.agent.transport.ArrayTypeAdaptor.deserialize(ArrayTypeAdaptor.java:80)
 at