[jira] [Commented] (CLOUDSTACK-9458) Some VMs are being stopped when agent is reconnecting

ASF GitHub Bot (JIRA) Sun, 18 Sep 2016 22:58:54 -0700

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502392#comment-15502392
 ]


ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------

Github user koushik-das commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    @abhinandanprateek In latest master the sequence of event described above 
only happens when the host has been determined as 'Down'. Refer to the below 
code. So the bug described won't happen. Earlier even when host state was 
'Alert' the same sequence used to get triggered which possibly killed healthy 
VMs.
    
    > if (host != null && host.getStatus() == Status.Down) {
    >     _haMgr.scheduleRestartForVmsOnHost(host, true);
    > }
    
    In case there is still a possibility of healthy VMs getting killed then the 
scenario needs to be clearly identified. If we need to fix anything, the first 
thing would be look at improving the VM investigators rather than changing the 
existing fencing logic.
    
    If we go ahead with the above fix then I can think of the following 
scenario that is broken. In case of a genuine host down scenario non-HA VMs 
continue to remain in 'Running' state and no operations can be done on it. 
Currently non-HA VMs are marked as 'Stopped' after fencing is successful and 
they can be manually started on another host.


> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the 
> agent for a few minutes, even though HA mode is not active the 
> HighAvailibilityManager kicks in and start to schedule vm restart. Those 
> tasks are being inserted as async job in the DB and if the agent comes back 
> online during the time the jobs are still in the async table, they are pushed 
> to the agent and shuts down the VMs. Then since HA is not active, the VM are 
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at 
> all if HA mode is not active on them, and let the agent update the VM state 
> with the power report.
> The bug lies in 
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host, 
> boolean investigate)}}, PR will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CLOUDSTACK-9458) Some VMs are being stopped when agent is reconnecting

Reply via email to