++Koushik

Thanks Sowmya

Koushik 

Any reason why this is cannot be applied to ACS 4.1?  Since I'm Release Manager 
for 4.1.x branch, I will work on testing and releasing 4.1.2

Thanks
ilya

> -----Original Message-----
> From: Sowmya Krishnan [mailto:sowmya.krish...@citrix.com]
> Sent: Thursday, September 26, 2013 3:03 AM
> To: dev@cloudstack.apache.org
> Subject: RE: [MAJOR][BUG] ACS powers off some VMs in vSphere - when MS
> service is restarted
> 
> Ilya,
> Is this similar to this issue:
> https://issues.apache.org/jira/browse/CLOUDSTACK-4636
> Fix for this has gone in 4.2-forward and master. As stated in the bug, this
> could happen if more than one thread tries to process the same host post
> MS restart.
> 
> > -----Original Message-----
> > From: Musayev, Ilya [mailto:imusa...@webmd.net]
> > Sent: Thursday, September 26, 2013 3:43 AM
> > To: dev@cloudstack.apache.org
> > Subject: RE: [MAJOR][BUG] ACS powers off some VMs in vSphere - when
> MS
> > service is restarted
> >
> > Sorry if the error log is abit hard to read, as an example, please
> > track the vmname i-2-262-acs-docs-fc17.
> >
> > 2013-09-25 14:35:49,928 DEBUG [vmware.resource.VmwareResource]
> > (AgentTaskPool-1:null) Detecting a new state but couldn't find a old
> > state so adding it to the changes: i-2-262-acs-docs-fc17
> > 2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> > (AgentTaskPool-1:null) VM i-2-262-acs-docs-fc17: cs state = Running
> > and realState = Stopped
> > 2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> > (AgentTaskPool-1:null) VM i-2-262-acs-docs-fc17: cs state = Running
> > and realState = Stopped
> > 2013-09-25 14:35:53,614 INFO  [cloud.ha.HighAvailabilityManagerImpl]
> > (AgentTaskPool-1:null) Skip HA for VMware VM i-2-262-acs-docs-fc17
> > 2013-09-25 14:35:53,694 DEBUG [agent.transport.Request]
> > (AgentTaskPool-
> > 1:null) Seq 11-1418264581: Sending  { Cmd , MgmtId: 345049078181, via:
> > 11,
> > Ver: v1, Flags: 100101,
> > [{"StopCommand":{"isProxy":false,"vmName":"i-2-262-
> > acs-docs-fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-
> > 2-
> > 278-demo01t-ops-
> > 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-281-acs-
> > appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-28
> > 3-
> > vmbld01l-ops-
> 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"
> > i-2-
> > 285-ossec01l-ops-08","wait":0}}] }
> > 2013-09-25 14:35:53,695 DEBUG [agent.transport.Request]
> > (AgentTaskPool-
> > 1:null) Seq 11-1418264581: Executing:  { Cmd , MgmtId: 345049078181,
> > via: 11,
> > Ver: v1, Flags: 100101,
> > [{"StopCommand":{"isProxy":false,"vmName":"i-2-262-
> > acs-docs-fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-
> > 2-
> > 278-demo01t-ops-
> > 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-281-acs-
> > appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-28
> > 3-
> > vmbld01l-ops-
> 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"
> > i-2-
> > 285-ossec01l-ops-08","wait":0}}] }
> > 2013-09-25 14:35:53,702 INFO  [vmware.resource.VmwareResource]
> > (DirectAgent-3:vmha62d-ops-08.portal.webmd.com) Executing resource
> > StopCommand:
> > {"isProxy":false,"vmName":"i-2-262-acs-docs-fc17","wait":0}
> > 2013-09-25 14:35:53,703 DEBUG [vmware.mo.HostMO] (DirectAgent-
> > 3:vmha62d-ops-08.portal.webmd.com) find VM i-2-262-acs-docs-fc17 on
> > host
> > 2013-09-25 14:35:54,753 INFO  [vmware.resource.VmwareResource]
> > (DirectAgent-3:vmha62d-ops-08.portal.webmd.com) VM
> > i-2-262-acs-docs-fc17 is already in stopped state
> >
> > > -----Original Message-----
> > > From: Musayev, Ilya [mailto:imusa...@webmd.net]
> > > Sent: Wednesday, September 25, 2013 6:08 PM
> > > To: dev@cloudstack.apache.org
> > > Subject: [MAJOR][BUG] ACS powers off some VMs in vSphere - when MS
> > > service is restarted
> > >
> > > Hi All,
> > >
> > > I'd like to raise an awareness on the issue with ACS 4.1.1 (I assume
> > > other versions are affected - since I could not find any changes in
> > > the latest code that would state otherwise).
> > >
> > > On MS server start/restart, it checks all the VMs for their state,
> > > if some reason state is either not found or comes as stopped - which
> > > is inaccurate, the vms will *power off*.
> > >
> > > On todays occurrence, half of my plant  went down because ACS
> > > invoked StopCommand on the vms that either had no state or for some
> > > reason agentState == Stopped.
> > >
> > > The details of this issue is in
> > > https://issues.apache.org/jira/browse/CLOUDSTACK-4740
> > >
> > > The error in the logs is:
> > >
> > > 2013-09-25 14:35:49,928 DEBUG [vmware.resource.VmwareResource]
> > > (AgentTaskPool-1:null) Detecting a new state but couldn't find a old
> > > state so adding it to the changes: i-2-262-acs-docs-fc17
> > > 2013-09-25 14:35:51,213 DEBUG [agent.transport.Request]
> > > (AgentTaskPool-
> > > 1:null) Seq -1--1: Startup request from directly connected host: {
> > > Cmd ,
> > > MgmtId: -1, via: -1, Ver: v1, Flags: 11,
> > >
> [{"cpus":16,"speed":2199,"memory":68683468800,"dom0MinMemory":0,"po
> > > olSync":false,"vms":{"i-8-270-CLOUD411":{"state":"Running"},"r-15-
> > > CLOUD41-OLD":{"state":"Stopped"},"v-260-
> > > CLOUD411":{"state":"Running"},"i-2-283-vmbld01l-ops-
> > > 08":{"state":"Running"},"i-2-104-ACS41VM":{"state":"Running"},"--s-1
> > > -
> > > CLOUD41-OLD":{"state":"Running"},"i-27-280-
> > > CLOUD411":{"state":"Running"},"i-2-285-ossec01l-ops-
> > > 08":{"state":"Running"},"i-2-262-acs-docs-fc17":{"state":"Stopped"},
> > > "i
> > > -24-
> > > 265-test3":{"state":"Running"},"cloud01l-ops-
> > > 08.portal.webmd.com":{"state":"Running"},"i-2-278-demo01t-ops-
> > > 08":{"state":"Running"},"s-63-CLOUD411":{"state":"Running"},"r-66-
> > > CLOUD411":{"state":"Running"},"i-2-281-acs-
> > >
> appliance":{"state":"Running"}},"caps":"hvm","hypervisorType":"VMware","
> > > hostDetails":{"com.cloud.network.Networks.RouterPrivateIpStrategy":"
> > > Dc
> > > Gl
> > > obal","NativeHA":"true"},"hypervisorVersion":"5.0","type":"Routing",
> > > "d
> > > ataC
> > > enter":"2","pod":"2","cluster":"3","guid":"HostSystem:host-19716@vc0
> > > 0q
> > > -
> > > ops-08.portal.webmd.com","name":"vmha62d-ops-
> > > 08.portal.webmd.com","version":"4.1.1-
> > >
> SNAPSHOT","privateIpAddress":"172.25.243.31","privateMacAddress":"68
> > > :b
> > > 5
> > >
> :99:73:0b:c2","privateNetmask":"255.255.255.0","storageIpAddress":"172.25.
> > > 243.31","storageNetmask":"255.255.255.0","storageMacAddress":"68:b5:
> > > 99
> > > :7
> > > 3:0b:c2","wait":0},{"totalSize":0,"poolInfo":{"uuid":"72c8aedb-58c4-
> > > 45
> > > 69-
> > > ac51-adc5af770bf6","host":"vmha62d-ops-
> > > 08.portal.webmd.com","localPath":"","hostPath":"datastore-
> > >
> 19718","poolType":"LVM","capacityBytes":141465485312,"availableBytes":
> > > 14
> > >
> 0383354880},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"S
> > > to
> > > r
> > > age","dataCenter":"2","pod":"2","cluster":"3","guid":"72c8aedb-58c4-
> > > 45
> > > 69-
> > > ac51-adc5af770bf6","name":"72c8aedb-58c4-4569-ac51-
> > > adc5af770bf6","wait":0}] }
> > > 2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> > > (AgentTaskPool-1:null) VM i-2-262-acs-docs-fc17: cs state = Running
> > > and realState = Stopped
> > > 2013-09-25 14:35:53,614 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> > > (AgentTaskPool-1:null) VM i-2-262-acs-docs-fc17: cs state = Running
> > > and realState = Stopped
> > > 2013-09-25 14:35:53,614 INFO [cloud.ha.HighAvailabilityManagerImpl]
> > > (AgentTaskPool-1:null) Skip HA for VMware VM i-2-262-acs-docs-fc17
> > > 2013-09-25 14:35:53,694 DEBUG [agent.transport.Request]
> > > (AgentTaskPool-
> > > 1:null) Seq 11-1418264581: Sending { Cmd , MgmtId: 345049078181, via:
> > > 11,
> > > Ver: v1, Flags: 100101,
> > > [{"StopCommand":{"isProxy":false,"vmName":"i-2-
> > > 262-acs-docs-
> > > fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-278-
> > > demo01t-ops-
> 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":
> > > "i
> > > -
> > > 2-281-acs-
> > > appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-
> > > 28
> > > 3-
> > > vmbld01l-ops-
> 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"
> > > i-
> > > 2-285-ossec01l-ops-08","wait":0}}] }
> > > 2013-09-25 14:35:53,695 DEBUG [agent.transport.Request]
> > > (AgentTaskPool-
> > > 1:null) Seq 11-1418264581: Executing: { Cmd , MgmtId: 345049078181,
> > > via: 11,
> > > Ver: v1, Flags: 100101,
> > > [{"StopCommand":{"isProxy":false,"vmName":"i-2-
> > > 262-acs-docs-
> > > fc17","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-278-
> > > demo01t-ops-
> 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":
> > > "i
> > > -
> > > 2-281-acs-
> > > appliance","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"i-2-
> > > 28
> > > 3-
> > > vmbld01l-ops-
> 08","wait":0}},{"StopCommand":{"isProxy":false,"vmName":"
> > > i-
> > > 2-285-ossec01l-ops-08","wait":0}}] }
> > > 2013-09-25 14:35:53,702 INFO [vmware.resource.VmwareResource]
> > > (DirectAgent-3:vmha62d-ops-08.portal.webmd.com) Executing resource
> > > StopCommand:
> > > {"isProxy":false,"vmName":"i-2-262-acs-docs-fc17","wait":0}
> > > 2013-09-25 14:35:53,703 DEBUG [vmware.mo.HostMO] (DirectAgent-
> > > 3:vmha62d-ops-08.portal.webmd.com) find VM i-2-262-acs-docs-fc17 on
> > > host
> > > 2013-09-25 14:35:54,753 INFO [vmware.resource.VmwareResource]
> > > (DirectAgent-3:vmha62d-ops-08.portal.webmd.com) VM i-2-262-acs-
> docs-
> > > fc17 is already in stopped state
> > > 2013-09-25 14:37:50,264 DEBUG [agent.transport.Request]
> > > (DirectAgent-
> > > 3:null) Seq 11-1418264581: Processing: { Ans: , MgmtId:
> > > 345049078181,
> > > via: 11,
> > > Ver: v1, Flags: 100,
> > > [{"StopAnswer":{"vncPort":0,"result":true,"details":"VM
> > > i-2-262-acs-docs-fc17 is already in stopped
> > > state","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details":"
> > > Stop
> > > VM i-2-278-demo01t-ops-08
> > > Succeed","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details"
> > > :"Stop
> > > VM i-2-281-acs-appliance
> > > Succeed","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details"
> > > :"Stop
> > > VM i-2-283-vmbld01l-ops-08
> > > Succeed","wait":0}},{"StopAnswer":{"vncPort":0,"result":true,"details"
> > > :"Stop VM i-2-285-ossec01l-ops-08 Succeed","wait":0}}] }
> > > 2013-09-25 14:38:14,528 DEBUG [vmware.mo.HostMO] (DirectAgent-
> > > 20:vmha62d-ops-08.portal.webmd.com) find VM i-2-262-acs-docs-fc17 on
> > > host
> > >
> > > I wonder why this was not noted earlier.
> > >
> > > The code in question is below, and comes from
> > >
> engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java
> > > on master branch.
> > >
> > > 2365 if (serverState == State.Running) {
> > > 2366 // Our records showed that it should be running so let's
> > > restart
> > > 2367 // it.
> > > 2368 _haMgr.scheduleRestart(vm, false);
> > > 2369 } else if (serverState == State.Stopping) {
> > > 2370 _haMgr.scheduleStop(vm, hostId, WorkType.ForceStop);
> > > 2371 s_logger.debug("Scheduling a check stop for VM in stopping mode:
> > > " + vm);
> > > 2372 } else if (serverState == State.Starting) {
> > > 2373 s_logger.debug("Ignoring VM in starting mode: " +
> > > vm.getInstanceName());
> > > 2374 _haMgr.scheduleRestart(vm, false);
> > > 2375 }
> > > 2376 command = cleanup(vm);
> > >
> > > If there is a quick fix, which I think for now, I can just comment
> > > out the "command= cleanup(vm)", please let me know. I'm not certain
> > > what would be the repercussions.
> > >
> > > As always any feedback, is appreciated.
> > >
> > > Thanks
> > > ilya
> > >
> > >
> >
> 


Reply via email to