Hi Andrei, Missed that mail, sorry. I created a PR that allows for longer timeouts [1].
Also, you can bump the router.aggregation.command.each.timeout global setting to say 15-30 so it will allow to boot. Next, we need to find why it takes so long in the first place. In our environment it at least starts now. Regards, Remi [1] https://github.com/apache/cloudstack/pull/1291 On 04/01/16 11:41, "Andrei Mikhailovsky" <and...@arhont.com> wrote: >Hello guys, > >Tried the user's mailing list without any luck. Perhaps the dev guys know if >this issue is being looked at for the next release? > >I've just upgraded to 4.6.2 and have similar issues with three virtual routers >out of 22 in total. They are all failing exactly the same way as described >here. > >Has anyone found a permanent workaround for this issue? > >Thanks > >Andrei > >----- Original Message ----- >> From: "Stephan Seitz" <s.se...@secretresearchfacility.com> >> To: "users" <us...@cloudstack.apache.org> >> Sent: Monday, 30 November, 2015 19:53:57 >> Subject: Re: upgrading 4.5.2 -> 4.6.0 virtualrouter upgrade timeout > >> Does anybody else experiemce problems due to (very) slow deployment of >> VRs? >> >> >> Am Dienstag, den 24.11.2015, 16:31 +0100 schrieb Stephan Seitz: >>> Update / FYI: >>> After faking the particular VRu in sql, I tried to restart that >>> network, >>> and it always fails. To me it looks like the update_config.py - which >>> takes almost all cpu ressources - runs way longer any watchdog will >>> accept. >>> >>> I'm able to mitigate that by very nasty workarounds: >>> a) start the router >>> b) wait until its provisioned >>> c) restart cloudstack-management >>> d) update vm_instance >>> set state='Running', >>> power_state='PowerOn' where name = 'r-XXX-VM'; >>> e) once: update domain_router >>> set template_version="Cloudstack Release 4.6.0 Wed Nov 4 08:22:47 UTC >>> 2015", >>> scripts_version="546c9e7ac38e0aa16ecc498899dac8e2" >>> where id=XXX; >>> f) wait until update_config.py finishes (for me thats about 15 >>> minutes) >>> >>> Since I expect the need for VR restarts in the future, this behaviour >>> is >>> somehow unsatisfying. It needs a lot of errorprone intervention. >>> >>> I'm quite unsure if it's introduced with the update or the particular >>> VR >>> just has simply not been restarted after getting configured with lots >>> of >>> ips and rules. >>> >>> >>> Am Dienstag, den 24.11.2015, 12:29 +0100 schrieb Stephan Seitz: >>> > Hi List! >>> > >>> > After upgrading from 4.5.2 to 4.6.0 I faced a problem with one >>> > virtualrouter. This particular VR has about 10 IPs w/ LB and FW >>> > rules >>> > defined. During the upgrade process, and after about 4-5 minutes a >>> > watchdog kicks in and kills the respective VR due to no response. >>> > >>> > So far I didn't find any timeout value in the global settings. >>> > Temporarily setting network.router.EnableServiceMonitoring to false >>> > doesn't change the behaviour. >>> > >>> > Any help, how to mitigate that nasty timeout would be really >>> > appreciated :) >>> > >>> > cheers, >>> > >>> > Stephan >>> > >>> > From within the VR, the logs show >>> > >>> > 2015-11-24 11:24:33,807 CsFile.py search:123 Searching for >>> > dhcp-range=interface:eth0,set:interface and replacing with >>> > dhcp-range=interface:eth0,set:interface-eth0,10.10.22.1,static >>> > 2015-11-24 11:24:33,808 merge.py load:56 Creating data bag type >>> > guestnetwork >>> > 2015-11-24 11:24:33,808 CsFile.py search:123 Searching for >>> > dhcp-option=tag:interface-eth0,15 and replacing with >>> > dhcp-option=tag:interface-eth0,15,heinlein.cloudservice >>> > 2015-11-24 11:24:33,808 CsFile.py search:123 Searching for >>> > dhcp-option=tag:interface-eth0,6 and replacing with >>> > dhcp-option=tag:interface >>> > -eth0,6,10.10.22.1,195.10.208.2,91.198.250.2 >>> > 2015-11-24 11:24:33,809 CsFile.py search:123 Searching for >>> > dhcp-option=tag:interface-eth0,3, and replacing with >>> > dhcp-option=tag:interface-eth0,3,10.10.22.1 >>> > 2015-11-24 11:24:33,809 CsFile.py search:123 Searching for >>> > dhcp-option=tag:interface-eth0,1, and replacing with >>> > dhcp-option=tag:interface-eth0,1,255.255.255.0 >>> > 2015-11-24 11:24:33,810 CsHelper.py execute:160 Executing: service >>> > dnsmasq restart >>> > >>> > ==> /var/log/messages <== >>> > Nov 24 11:24:34 r-504-VM shutdown[6752]: shutting down for system >>> > halt >>> > >>> > Broadcast message from root@r-504-VM (Tue Nov 24 11:24:34 2015): >>> > >>> > The system is going down for system halt NOW! >>> > Nov 24 11:24:35 r-504-VM KVP: KVP starting; pid is:6844 >>> > >>> > ==> /var/log/cloud.log <== >>> > /opt/cloud/bin/vr_cfg.sh: line 60: 6603 >>> > Killed /opt/cloud/bin/update_config.py >>> > vm_dhcp_entry.json >>> > >>> > ==> /var/log/messages <== >>> > Nov 24 11:24:35 r-504-VM cloud: VR config: executing >>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json >>> > >>> > ==> /var/log/cloud.log <== >>> > Tue Nov 24 11:24:35 UTC 2015 : VR config: executing >>> > failed: /opt/cloud/bin/update_config.py vm_dhcp_entry.json >>> > Connection to 169.254.2.192 closed by remote host. >>> > Connection to 169.254.2.192 closed. >>> > >>> > >>> > the management-server.log shows >>> > >>> > 2015-11-24 12:24:43,015 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Done executing >>> > com.cloud.vm.VmWorkStart for job-5164 >>> > 2015-11-24 12:24:43,017 INFO [o.a.c.f.j.i.AsyncJobMonitor] >>> > (Work-Job-Executor-1:ctx-ad9e4658 job-5163/job-5164) Remove job >>> > -5164 >>> > from job monitoring >>> > 2015-11-24 12:24:43,114 ERROR [c.c.a.ApiAsyncJobDispatcher] >>> > (API-Job-Executor-1:ctx-760da779 job-5163) Unexpected exception >>> > while >>> > executing org.apache.cloudstack.api.command.admin. >>> > router.StartRouterCmd >>> > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is >>> > unreachable: Host 1: Unable to start instance due to Unable to >>> > start >>> > VM[DomainRouter|r-504-VM] due to error in f >>> > inalizeStart, not retrying >>> > at >>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>> > ineManagerImpl.java:1121) >>> > at >>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>> > ineManagerImpl.java:4580) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> > Method) >>> > at >>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImp >>> > l.java:57) >>> > at >>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcc >>> > essorImpl.java:43) >>> > at java.lang.reflect.Method.invoke(Method.java:606) >>> > at >>> > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandler >>> > Proxy.java:107) >>> > at >>> > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachi >>> > neManagerImpl.java:4736) >>> > at >>> > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:10 >>> > 2) >>> > at >>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl >>> > $5.runInContext(AsyncJobManagerImpl.java:537) >>> > at >>> > org.apache.cloudstack.managed.context.ManagedContextRunnable >>> > $1.run(ManagedContextRunnable.java:49) >>> > at >>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext >>> > $1.call(DefaultManagedContext.java:56) >>> > at >>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ca >>> > llWithContext(DefaultManagedContext.java:103) >>> > at >>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.ru >>> > nWithContext(DefaultManagedContext.java:53) >>> > at >>> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(Ma >>> > nagedContextRunnable.java:46) >>> > at >>> > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl >>> > $5.run(AsyncJobManagerImpl.java:494) >>> > at java.util.concurrent.Executors >>> > $RunnableAdapter.call(Executors.java:471) >>> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> > at >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecuto >>> > r.java:1145) >>> > at java.util.concurrent.ThreadPoolExecutor >>> > $Worker.run(ThreadPoolExecutor.java:615) >>> > at java.lang.Thread.run(Thread.java:745) >>> > Caused by: com.cloud.utils.exception.ExecutionException: Unable to >>> > start >>> > VM[DomainRouter|r-504-VM] due to error in finalizeStart, not >>> > retrying >>> > at >>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>> > ineManagerImpl.java:1085) >>> > at >>> > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMach >>> > ineManagerImpl.java:4580) >>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >>> > Method) >>> > ... 18 more >>> > 2015-11-24 12:24:43,115 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] >>> > (API-Job-Executor-1:ctx-760da779 job-5163) Complete async job-5163, >>> > jobStatus: FAILED, resultCode: 530, result: org. >>> > apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[ >>> > ],"errorcode":530,"errortext":"Resource [Host:1] is unreachable: >>> > Host 1: Unable to start instance due to Unable t >>> > o start VM[DomainRouter|r-504-VM] due to error in finalizeStart, >>> > not >>> > retrying"} >>> > >>> > >>> > >>> > >>> > >>> > >>>