[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14196733#comment-14196733
 ] 

edison su commented on CLOUDSTACK-5452:
---------------------------------------

It's due to limitation of current agent model, can't cancel a running task on 
the agent side.
The problem is:
if there is running task which takes forever to finish, we can't do anything 
about it, unless restart agent and kill all the running processes spawned by 
java agent. 
Need human intervention in this case. We have to manually kill this jobs, 
otherwise, the system will be in inconsistent state.

> KVM - Agent is not able to connect back if management server was restarted 
> when there are pending tasks to this host.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5452
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5452
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server
>    Affects Versions: 4.3.0
>         Environment: Build from 4.3
>            Reporter: Sangeetha Hariharan
>            Assignee: edison su
>            Priority: Critical
>             Fix For: 4.5.0
>
>
> KVM - Agent is not able to connect back if management server was restarted 
> when there are pending tasks to this host.
> Steps to reproduce the problem:
> Set up - Advanced zone with 2 KVM ( RHEL 6.3) hosts.
> Deployed few Vms.
> Started snapshot for ROOT volume of the VMs.
> When the snapshot processes  are still in progress , restart management 
> server.
> When the management sever started , the KVM hosts remain in disconnected 
> state.
> Attempt to stop Vms /start Vms fails because of having no connection to the 
> host.
> Following is seen in agent logs:
> 2013-12-10 20:56:46,640 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:56:46,640 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:56:51,641 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:56:51,642 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:56:56,642 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:56:56,643 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:01,644 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:01,644 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:06,644 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:06,645 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:11,645 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:11,646 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:16,647 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:16,647 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:21,648 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:21,648 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:26,649 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:26,675 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:31,676 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:31,677 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:36,678 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-10 20:57:36,678 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) 
> Cannot connect because we still have 1 commands in progress.
> 2013-12-10 20:57:41,678 INFO  [cloud.agent.Agent] (Agent-Handler-2:null) Lost 
> connection to the server. Dealing with the remaining commands...
> :



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to