[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15448760#comment-15448760
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------

Github user abhinandanprateek commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    @marcaurele @koushik-das When the MS thinks that the VM is down, it issues 
a stop command. This is done to clear up the resources on management server db 
tied up for that VM. Now it is seen several times that this actually kills a 
healthy VM. I have seen this issue in MS cluster with agent.lb turned on.
    The issue is that we do need a state cleanup when a running VM is found to 
be stopped on the host. But this probably should not induce a shutdown on the 
host ? really, but again this is a tricky boundary condition.



> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the 
> agent for a few minutes, even though HA mode is not active the 
> HighAvailibilityManager kicks in and start to schedule vm restart. Those 
> tasks are being inserted as async job in the DB and if the agent comes back 
> online during the time the jobs are still in the async table, they are pushed 
> to the agent and shuts down the VMs. Then since HA is not active, the VM are 
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at 
> all if HA mode is not active on them, and let the agent update the VM state 
> with the power report.
> The bug lies in 
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host, 
> boolean investigate)}}, PR will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to