[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-5582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854433#comment-13854433
 ] 

edison su commented on CLOUDSTACK-5582:
---------------------------------------

The ha manager has bug which introduced by Alex's commit: 
5297a071d2c20040878950172b8d0211ac7cb436

HaManagerImpl->scheduleRestart, if investigate is passed as "false", which is 
the case when kvm agent connecting back to mgt server, the code will stop the 
vm, but didn't reload the vm object, so this line of code:
HaWorkVO work = new HaWorkVO(vm.getId(), vm.getType(), WorkType.HA, investigate 
? Step.Investigating : Step.Scheduled, hostId, vm.getState(), maxRetries + 1, 
vm.getUpdated());
 will store vm state as running in haworkvo.

Then this line of code will be reached:



 s_logger.info("HA on " + vm);
        if (vm.getState() != work.getPreviousState() || vm.getUpdated() != 
work.getUpdateTime()) {
            s_logger.info("VM " + vm + " has been changed.  Current State = " + 
vm.getState() + " Previous State = " + work.getPreviousState() + " last updated 
= " + vm.getUpdated()
                    + " previous updated = " + work.getUpdateTime());
            return null;
        }

Then HA won't be triggered. 

The fix will be reload vm state, in scheduleRestart, after 
_itMgr.advanceStop(vm.getUuid(), true); is called.



> kvm - HA is not triggered when host is powered down since the host gets into 
> "Disconnected" state. 
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5582
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5582
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server
>    Affects Versions: 4.3.0
>         Environment: Build from 4.3
>            Reporter: Sangeetha Hariharan
>            Assignee: edison su
>            Priority: Critical
>             Fix For: 4.3.0
>
>
> kvm - HA is not triggered when host is powered down since the host gets into 
> "Disconnected" state.
> Advanced zone with  2 KVM (RHEL 6.3) hosts.
> Steps to reproduce the problem:
> Deploy few Vms in each of the hosts .
> Power down one of the hosts ( using IPMI).
> We see that the host gets into "Disconnected" state.
> All the Vms that are running in this host continue to be in "Up" state.
> This happens because of management server receiving a explicit shutdown 
> request from the agent:
> 2013-12-19 21:06:37,262 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentManager-Handler-15:null) SeqA 2--1: Processing Seq 2--1:  { Cmd , 
> MgmtId: -1, via: 2, Ver: v1, Flags: 111, 
> [{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","wait":0}}] }
> 2013-12-19 21:06:37,263 INFO  [c.c.a.m.AgentManagerImpl] 
> (AgentManager-Handler-15:null) Host 2 has informed us that it is shutting 
> down with reason sig.kill and detail null
> 2013-12-19 21:06:37,263 INFO  [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) Host 2 is disconnecting with event 
> ShutdownRequested
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) The next status of agent 2is Disconnected, 
> current status is Up
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) Deregistering link for 2 with state 
> Disconnected
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.AgentManagerImpl] 
> (AgentTaskPool-1:ctx-a32ed8e2) Remove Agent : 2
> 2013-12-19 21:06:37,264 DEBUG [c.c.a.m.ConnectedAgentAttache] 
> (AgentTaskPool-1:ctx-a32ed8e2) Processing Disconnect.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to