[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424303#comment-15424303
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9458:
--------------------------------------------

Github user koushik-das commented on the issue:

    https://github.com/apache/cloudstack/pull/1640
  
    @marcaurele Can you share the MS logs for this issue? We need to understand 
the exact cause for restart of VM? When an agent/host is detected as 'Down', CS 
tries to check if VMs on it are alive or not, if found alive nothing is done on 
the VM.
    
    Also if you think that the host got disconnected intermittently, then there 
are ways to adjust the timeout in CS after which it will start investigating 
the host status. Try adjusting the ping.timeout configuration parameter to see 
if the issue is resolved.
    
    The investigation to check if VM is alive or not is done for all VMs 
irrespective of HA enabled or not. If a host is really down then it makes sense 
to mark the VMs as stopped. Additionally for HA enabled VMs, after they are 
successfully fenced off, attempt is made to restart them on other hosts in the 
cluster.


> Some VMs are being stopped when agent is reconnecting
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-9458
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9458
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Marc-Aurèle Brothier
>            Assignee: Marc-Aurèle Brothier
>
> If you loose the communication between the management server and one of the 
> agent for a few minutes, even though HA mode is not active the 
> HighAvailibilityManager kicks in and start to schedule vm restart. Those 
> tasks are being inserted as async job in the DB and if the agent comes back 
> online during the time the jobs are still in the async table, they are pushed 
> to the agent and shuts down the VMs. Then since HA is not active, the VM are 
> not restarted.
> The expected behavior in my opinion is that the VM should not be restarted at 
> all if HA mode is not active on them, and let the agent update the VM state 
> with the power report.
> The bug lies in 
> {{HighAvailibilityManagerImpl.scheduleRestartForVmsOnHost(final HostVO host, 
> boolean investigate)}}, PR will follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to