Sheng Yang created CLOUDSTACK-689:
-------------------------------------
Summary: RVR: Stop pending flag is not cleared when user start the
disconnected router from another host
Key: CLOUDSTACK-689
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-689
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Network Controller
Affects Versions: 4.0.0
Reporter: Sheng Yang
Assignee: Sheng Yang
Fix For: 4.1.0
Here is what's happened:
2012-09-16 21:27:24,285 WARN
[network.router.VirtualNetworkApplianceManagerImpl] (HA-Worker-0:work-37779)
Unable to apply dhcp entry on disconnected router r-38445-VM
com.cloud.exception.AgentUnavailableException: Resource [Host:610] is
unreachable: Host 610: Host is not in the right state: Down
[...]
2012-09-16 21:27:24,286 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl] (HA-Worker-0:work-37779)
About to stop the router r-38445-VM due to: Unable to apply dhcp entry on
disconnected router
At the above time, we set stop pending flag, due to host 610 is down, and we
failed to reach the router r-38445-VM.
Then:
2012-09-16 21:51:54,067 DEBUG [cloud.async.AsyncJobManagerImpl]
(catalina-exec-28:null) submit async job-296045, details: AsyncJobVO
{id:296045, userId: 6217, accountId: 6178, sessionKey: null, instanceType:
DomainRouter, instanceId: 38445, cmd: com.cloud.api.commands.StartRouterCmd,
cmdOriginator: null, cmdInfo:
{"response":"json","id":"f14313a7-5490-408f-a9d7-292303784f8f","sessionkey":"lByhfEpE7fygtOASvMH3i8xgEok\u003d","ctxUserId":"6217","_":"1347799961307","ctxAccountId":"6178","ctxStartEventId":"1533940"},
cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0,
processStatus: 0, resultCode: 0, result: null, initMsid: 152687106451262,
completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2012-09-16 21:51:54,070 DEBUG [cloud.async.AsyncJobManagerImpl]
(Job-Executor-126:job-296045) Executing com.cloud.api.commands.StartRouterCmd
for job-296045
User start the router manually, and CS deploy it on host 606:
2012-09-16 21:51:56,069 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(Job-Executor-126:job-296045) Deployment found -
P0=VM[DomainRouter|r-38445-VM],
P0=Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))]
: Dest[Zone(
2)-Pod(29)-Cluster(52)-Host(606)-Storage()]
2012-09-16 21:51:56,090 DEBUG [cloud.capacity.CapacityManagerImpl]
(Job-Executor-126:job-296045) VM state transitted from :Starting to Starting
with event: OperationRetryvm's original host id: 610 new host id: 606 host id
before state transiti
on: null
We didn't cover this situation in our code. And this won't clear stop pending
flag of course.
Also the stop pending flag can be set if the command sent to router is time
out, but host is not detected as Disconnected by CS.
See the mgmt log attached for r-5-VM:
2012-09-29 22:20:55,413 DEBUG [agent.manager.AgentAttache]
(Job-Executor-130:job-450) Seq 1-1662189781: Cancelling.
2012-09-29 22:20:55,413 WARN
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-130:job-450)
Timed Out
com.cloud.exception.OperationTimedoutException: Commands 1662189781 to Host 1
timed out after 3600
at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:425)
at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:501)
at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:1022)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendCommandsToRouter(VirtualNetworkApplianceManagerImpl.java:2590)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendPortForwardingRules(VirtualNetworkApplianceManagerImpl.java:2725)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl$5.execute(VirtualNetworkApplianceManagerImpl.java:2703)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:2809)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyFirewallRules(VirtualNetworkApplianceManagerImpl.java:2688)
at
com.cloud.network.element.VirtualRouterElement.applyPFRules(VirtualRouterElement.java:636)
at
com.cloud.network.NetworkManagerImpl.applyRules(NetworkManagerImpl.java:3245)
at
com.cloud.network.firewall.FirewallManagerImpl.applyRules(FirewallManagerImpl.java:374)
at
com.cloud.network.rules.RulesManagerImpl.applyPortForwardingRulesForNetwork(RulesManagerImpl.java:714)
at
com.cloud.network.NetworkManagerImpl.reprogramNetworkRules(NetworkManagerImpl.java:3516)
at
com.cloud.network.NetworkManagerImpl.implementNetworkElementsAndResources(NetworkManagerImpl.java:1812)
at
com.cloud.network.NetworkManagerImpl.implementNetwork(NetworkManagerImpl.java:1754)
at
com.cloud.utils.db.DatabaseCallback.intercept(DatabaseCallback.java:30)
at
com.cloud.network.NetworkManagerImpl.startNetwork(NetworkManagerImpl.java:3436)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2285)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2255)
at
com.cloud.event.ActionEventCallback.intercept(ActionEventCallback.java:32)
at
com.cloud.api.commands.StartRouterCmd.execute(StartRouterCmd.java:101)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:134)
at
com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:427)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
2012-09-29 22:20:55,414 WARN
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-130:job-450)
Unable to apply firewall rules on disconnected router r-5-VM
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is
unreachable: Host 1: Unable to send commands to virtual router
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendCommandsToRouter(VirtualNetworkApplianceManagerImpl.java:2593)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendPortForwardingRules(VirtualNetworkApplianceManagerImpl.java:2725)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl$5.execute(VirtualNetworkApplianceManagerImpl.java:2703)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:2809)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyFirewallRules(VirtualNetworkApplianceManagerImpl.java:2688)
at
com.cloud.network.element.VirtualRouterElement.applyPFRules(VirtualRouterElement.java:636)
at
com.cloud.network.NetworkManagerImpl.applyRules(NetworkManagerImpl.java:3245)
at
com.cloud.network.firewall.FirewallManagerImpl.applyRules(FirewallManagerImpl.java:374)
at
com.cloud.network.rules.RulesManagerImpl.applyPortForwardingRulesForNetwork(RulesManagerImpl.java:714)
at
com.cloud.network.NetworkManagerImpl.reprogramNetworkRules(NetworkManagerImpl.java:3516)
at
com.cloud.network.NetworkManagerImpl.implementNetworkElementsAndResources(NetworkManagerImpl.java:1812)
at
com.cloud.network.NetworkManagerImpl.implementNetwork(NetworkManagerImpl.java:1754)
at
com.cloud.utils.db.DatabaseCallback.intercept(DatabaseCallback.java:30)
at
com.cloud.network.NetworkManagerImpl.startNetwork(NetworkManagerImpl.java:3436)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2285)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2255)
at
com.cloud.event.ActionEventCallback.intercept(ActionEventCallback.java:32)
at
com.cloud.api.commands.StartRouterCmd.execute(StartRouterCmd.java:101)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:134)
at
com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:427)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: com.cloud.exception.OperationTimedoutException: Commands 1662189781
to Host 1 timed out after 3600
at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:425)
at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:501)
at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:1022)
at
com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendCommandsToRouter(VirtualNetworkApplianceManagerImpl.java:2590)
... 33 more
2012-09-29 22:20:55,415 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-130:job-450)
About to stop the router r-5-VM due to: Unable to apply firewall rules on
disconnected router
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira