Hi Rohit Finally we had some time to dig deeper into this issue. The /bin/bash issue was a typo on our side, sorry!
System VM was destroyed/recreated several times. But something with the ssh keys is still wrong. The KVM is still asking for the pass phrase when ssh into system vms. *# Management Server* mysql> select * from configuration where name = "ssh.publickey" \G; *************************** 1. row *************************** category: Hidden instance: DEFAULT component: management-server name: ssh.publickey value: XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a description: Public key for the entire CloudStack default_value: NULL updated: NULL scope: NULL is_dynamic: 0 1 row in set (0.00 sec) mysql> select * from configuration where name = "ssh.privatekey" \G; *************************** 1. row *************************** category: Hidden instance: DEFAULT component: management-server name: ssh.privatekey value: j+Z5ckPXUq7lwo2L4OYaaiDOA/fdUpadX3h7E/DA1rwPRSz967eckCbRRpCtCWxfQjtAgPNfLgGIQHX4K7o9yF9DBXG0ESXoECyrRAOTmfZdvdSZceRGd9TYcjwCAn4s8GKGQRKeQSQLL8Pusc/PmYLISIomkOHhZB3lp1bfniKC6uf6RwhRFNfX38ZeqVhRNgagK2hxOo67Ev7uUX1NIUWw/ROq/RC7bG58DD1uhUOX5Bm5wVMSmf3Pv7s/z2t7ZGcX8YMnW ..... .... .... .... /uPooWhUQpcG8Q8APvfEKMp0BVJATh4Gk4qkgnR4CHyefUYCDHmWZnpuCj1ra7Kn5LV7AdWG1+C7bl7hSmpxE8StLfEwe72zuPJsI86LekxhJJpsorjvSFYklJhGizSs4uuyRP2Dqayx1qFkFJe26AiWb86ld2gQ2Vwj3ozgHX2OMp60jQM8ZVM9VuroOEmiOZ5gkfhFFIf5Y78WcZU2ScSCAHVavNAIsfX+i01iNxwnXtrmrA9ctlZGuZZRR3TklAXQz4M69SFFYuo5e7OsJ2/Tr9K7AhUHA8kwR+BxPLFGRlJpyhmLUuPartDKRBAp2BwU1XsFiIs8Y6pn9wkRJaeeDdpUt9zPCYMHaHtXQvFMe+d96J4HMPvfLgGnd55Hz7GKsglX9su7XpuAqzkuPM5u6GY5LLAtXUOqE2Wm855sLRW1kbHnX8pAfIe9wQbfUQ9wsfCh66w1duce0vyuj4A7A8HC4V8+U2qXjwMIhLpfO5phfUy4drlUzBzx9ZMK+JUYP6kszj61M09psoYggUBXMYCfg6oWJSjA8= description: Private key for the entire CloudStack default_value: NULL updated: NULL scope: NULL is_dynamic: 0 1 row in set (0.00 sec) *# Secondary Storage VM (note that the key is equal to the management server database entry ssh.publickey)* root@s-252-VM:~# cat /root/.ssh/authorized_keys XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a root@s-252-VM:~# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 0e:00:a9:fe:00:18 inet addr:169.254.0.24 Bcast:169.254.255.255 Mask:255.255.0.0 *#KVM Hosts* root@kvm704:~/.ssh# cat /root/.ssh/id_rsa.pub.cloud XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5aroot@kvm704 root@kvm704:~/.ssh# ssh -p 3922 -q -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.0.24 Enter passphrase for key '/root/.ssh/id_rsa.cloud': *# Router which was not redeployed * root@r-191-VM:~# cat .ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDoZcfuh/R2i/n69hFBJEfKKtTQQAs7iOIC9PxFUWrFONk9m4R8Atj8o5wLyHflhe10iMHUJTGH84/J4min1r7zU4xGqzOBR3CKuz6dowmC+IQ38iyJfTOC9L1eIl8AebmKNwSSgjS6nc6sK13iFpgr8mSIr7ExcXPxYgWNgbQAm/HvLVc8Ivf2FZR8f26i3EyjsLVKpHoyvVMeVdoAHBdQB/NiUslAz2fz3cPXxVSxpZobd7iy/uJNVK1CHvqhGYLrXym9cjfM0BOUhN80NN9SvAiyj9D7PHiwcDBzMSSPljxwF6rPY7BXHq+7AVvzVPTJYEuBVQllYn1RrjCS/Fdr cloud@cloudstack If we understand correctly the public and private key are encrypted in the management database. Based on the Source Code master# ag ssh.publickey server/src/com/cloud/server/ConfigurationServerImpl.java 761: "VALUES ('Hidden','DEFAULT', 'management-server','ssh.publickey', '" +*DBEncryptionUtil.encrypt(publicKey)* + 855: String pubKey = _configDao.getValue("ssh.publickey"); - It looks like the rsa key is still encrypted on the KVM Server and on the System VMs? - Could this issue be related with the keystore cleanup? - We are running out of ideas....arrr regards Martin On 08/31/2016 07:31 PM, Rohit Yadav wrote: > Hi Martin, > > > Looks like something went wrong with the CloudStack generated public/private > keys. You should not be prompted for any password when trying to ssh to > systemvms on port 3922. Can you try to destroy the VRs or create a new > network which would create a new VR. > > > Regarding the /bin/bash not found issue, that looks fishy. /bin/bash should > be available on both Ubuntu 14.04 based KVM hosts and also on systemvms > (which are Debian7 based). > > > Regards. > > ________________________________ > From: martin kolly <martin.ko...@senselan.ch> > Sent: 29 August 2016 20:10:32 > To: dev@cloudstack.apache.org > Subject: Re: CS 4.9 NIO Selector wait time PR-1601 > > > We have done more investigations. An error happens when the script > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh is > executed: > > MGMT LOGS: > 2016-08-29 16:00:11,342 DEBUG [c.c.a.t.Request] > (AgentManager-Handler-12:null) (logid:) Seq 8-1837187172990452475: > Processing: { Ans: , MgmtId: 90520741415395, via: 8, Ver: v1, Flags: > 10, > [{"com.cloud.agent.api.GetRouterAlertsAnswer":{"result":false,"details":"/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: > 1: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: > h!/bin/bash: not found","wait":0}}] } > 2016-08-29 16:00:11,345 DEBUG [c.c.a.t.Request] > (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Seq > 8-1837187172990452475: Received: { Ans: , MgmtId: 90520741415395, via: > 8(kvm702), Ver: v1, Flags: 10, { GetRouterAlertsAnswer } } > *2016-08-29 16:00:11,345 DEBUG [c.c.a.m.AgentManagerImpl] > (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Details from > executing class com.cloud.agent.api.routing.GetRouterAlertsCommand: > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1: > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: > h!/bin/bash: not found* > 2016-08-29 16:00:11,345 WARN > [c.c.n.r.VirtualNetworkApplianceManagerImpl] > (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Unable to get > alerts from router r-221-VM > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1: > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: > h!/bin/bash: not found > 2016-08-29 16:00:11,411 INFO [o.a.c.e.o.NetworkOrchestrator] > (Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104) > NetworkGarbageCollector uses '600' seconds for GC interval. > 2016-08-29 16:00:11,416 DEBUG [o.a.c.e.o.NetworkOrchestrator] > (Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104) We found network 219 > to be free for the first time. Adding it to the list: 1472479211 > 2016-08-29 16:00:12,392 DEBUG [c.c.s.StatsCollector] > (StatsCollector-6:ctx-c1aeb528) (logid:a364cc05) StorageCollector is > running... > 2016-08-29 16:00:12,403 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] > (StatsCollector-6:ctx-c1aeb528) (logid:a364cc05) > getCommandHostDelegation: class com.cloud.agent.api.GetStorageStatsCommand > > For debugging we modified the "router_proxy.sh" slightly on the KVM Host: > > kvm# grep -A 2 DEBUG > /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh -n > 46:# DEBUG CS Upgrade > *47-echo "$(date): ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert > root@$domRIp /opt/cloud/bin/$script $*" >> /tmp/router.txt* > 48-ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert root@$domRIp > "/opt/cloud/bin/$script $*" > > kvm# tail /tmp/router.txt > Mon Aug 29 16:10:11 CEST 2016: ssh -p 3922 -q -o > StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.2.67 > /opt/cloud/bin/netusage.sh -g > Mon Aug 29 16:10:11 CEST 2016: ssh -p 3922 -q -o > StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud root@169.254.2.67 > /opt/cloud/bin/netusage.sh -g > > On the router with IP 169.254.2.67 we see the failed logins as well: > > root@r-221-VM:~# tail -n 2 /var/log/auth.log > Aug 29 14:10:11 r-221-VM sshd[7074]: Connection closed by 169.254.0.1 > [preauth] > Aug 29 14:10:11 r-221-VM sshd[7075]: Connection closed by 169.254.0.1 > [preauth] > > These commands did not succeed. If we execute the command manually we > get prompted for the key passphrase. > > kvm# ssh -p 3922 -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud > root@169.254.2.67 > Enter passphrase for key '/root/.ssh/id_rsa.cloud': > > can someone confirm that the rsa key is protected with a passhrase? this > looks suspicious to us... > > regards > martin > > > > > > > > On 08/29/2016 10:09 AM, martin kolly wrote: >> thanks Simon and Rohit for the valuable inputs! After applying the >> following procedure the ssl errors are gone and host state is UP. >> >> # service cloudstack-management stop >> >> mysql> delete from configuration where name = "ssl.keystore" ; >> # mv /etc/cloudstack/management/cloudmanagementserver.keystore >> /etc/cloudstack/management/cloudmanagementserver.keystore.old >> >> # service cloudstack-management start >> >> # file /etc/cloudstack/management/cloudmanagementserver.keystore >> /etc/cloudstack/management/cloudmanagementserver.keystore: Java KeyStore >> # file /etc/cloudstack/management/cloudmanagementserver.keystore.old >> /etc/cloudstack/management/cloudmanagementserver.keystore.old: data >> >> #java -version >> java version "1.7.0_111" >> >> However new routers are not deployed, we still see entries that KVM >> hosts are unreachable (logs underneath). >> >> After starting a new router it is shown with "virsh list" on the KVM: >> kvm# virsh list | grep r-233-VM >> 111 r-233-VM running >> >> After some seconds the router is deleted, >> >> kvm# virsh list | grep r-233-VM >> # >> >> kvm#virsh -v >> 1.2.2 >> >> (libvirt was already restarted) >> >> >> >> ############ Logs/Info KVM ################# >> >> kvm # dpkg -l | grep cloudstack >> ii cloudstack-agent >> 4.9.0 all CloudStack agent >> ii cloudstack-common >> 4.9.0 all A common package >> which contains files which are shared by several CloudStack packages >> >> kvm# df -kh | grep cloud >> 10.100.12.9:/export/cloud 188G 137G 51G 73% >> /mnt/5db02c19-1e8f-3591-bdb4-02608362521e >> >> kvm# tail -f /var/log/cloudstack/agent/agent.log >> ... >> 2016-08-29 06:55:06.203+0000: 4425: error : >> qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine >> balloon device path >> 2016-08-29 06:55:06.223+0000: 4426: error : >> qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine >> balloon device path >> 2016-08-29 06:55:24.062+0000: 4425: warning : qemuDomainObjTaint:1628 : >> Domain id=112 name='r-233-VM' uuid=d9dcd37a-242d-43ac-a18e-79a4bfa86ebb >> is tainted: high-privileges >> 2016-08-29 06:55:49.066+0000: 4423: error : qemuMonitorIO:656 : internal >> error: End of file from monitor >> >> >> >> >> ############ Logs/Info Management Server ######## >> >> # dpkg -l | grep cloudstack >> ii cloudstack-agent >> 4.9.0 all CloudStack agent >> ii cloudstack-common >> 4.9.0 all A common package >> which contains files which are shared by several CloudStack packages >> ii cloudstack-management >> 4.9.0 all CloudStack server library >> ii cloudstack-usage >> 4.9.0 all CloudStack usage monitor >> >> # mysql -u root cloud -e "select id,name,path from cloud.storage_pool >> where pool_type='Filesystem'" >> +----+----------------------+-------------------------+ >> | id | name | path | >> +----+----------------------+-------------------------+ >> | 1 | kvm704 Local Storage | /var/lib/libvirt/images | >> | 2 | kvm701 Local Storage | /var/lib/libvirt/images | >> | 3 | kvm702 Local Storage | /var/lib/libvirt/images | >> | 4 | kvm703 Local Storage | /var/lib/libvirt/images | >> +----+----------------------+-------------------------+ >> >> #tail -f /var/log/cloudstack/management/management-server.log | grep -v >> DEBUG >> ... >> 2016-08-29 08:55:42,484 WARN [o.a.c.alerts] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) alertType:: 9 // dataCenterId:: 1 // podId:: 1 // >> clusterId:: null // message:: Command: >> com.cloud.agent.api.GetDomRVersionCommand failed while starting virtual >> router >> 2016-08-29 08:55:42,494 ERROR >> [c.c.n.r.VirtualNetworkApplianceManagerImpl] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) GetDomRVersionCmd failed >> 2016-08-29 08:55:42,495 WARN >> [c.c.n.r.VirtualNetworkApplianceManagerImpl] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) Command: com.cloud.agent.api.GetDomRVersionCommand >> failed while starting virtual router >> 2016-08-29 08:55:42,495 INFO [c.c.v.VirtualMachineManagerImpl] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) The guru did not like the answers so stopping >> VM[DomainRouter|r-233-VM] >> 2016-08-29 08:55:49,184 INFO [o.a.c.f.j.i.AsyncJobManagerImpl] >> (AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) Begin cleanup >> expired async-jobs >> 2016-08-29 08:55:49,192 INFO [o.a.c.f.j.i.AsyncJobManagerImpl] >> (AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) End cleanup >> expired async-jobs >> 2016-08-29 08:55:50,861 ERROR [c.c.v.VirtualMachineManagerImpl] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) Failed to start instance VM[DomainRouter|r-233-VM] >> com.cloud.utils.exception.ExecutionException: Unable to start >> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, >> not retrying >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) >> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) >> at >> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) >> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) >> at >> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) >> at >> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) >> at >> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) >> at >> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> 2016-08-29 08:55:51,136 ERROR [c.c.v.VmWorkJobHandlerProxy] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) Invocation exception, caused by: >> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is >> unreachable: Host 8: Unable to start instance due to Unable to start >> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, >> not retrying >> 2016-08-29 08:55:51,136 INFO [c.c.v.VmWorkJobHandlerProxy] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a) >> (logid:337b354e) Rethrow exception >> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is >> unreachable: Host 8: Unable to start instance due to Unable to start >> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, >> not retrying >> 2016-08-29 08:55:51,137 ERROR [c.c.v.VmWorkJobDispatcher] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e) >> Unable to complete AsyncJobVO {id:2739, userId: 2, accountId: 2, >> instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, >> cmdInfo: >> rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAA6XQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw, >> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, >> result: null, initMsid: 90520741415395, completeMsid: null, lastUpdated: >> null, lastPolled: null, created: Mon Aug 29 08:55:21 CEST 2016}, job >> origin:2738 >> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is >> unreachable: Host 8: Unable to start instance due to Unable to start >> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, >> not retrying >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120) >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) >> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) >> at >> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) >> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) >> at >> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) >> at >> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) >> at >> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) >> at >> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: com.cloud.utils.exception.ExecutionException: Unable to >> start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in >> finalizeStart, not retrying >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) >> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) >> ... 17 more >> 2016-08-29 08:55:51,168 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e) >> Remove job-2739 from job monitoring >> 2016-08-29 08:55:53,197 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> (Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:db064d99) >> Add job-2740 into job monitoring >> 2016-08-29 08:55:53,218 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> (Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:337b354e) >> Remove job-2740 from job monitoring >> 2016-08-29 08:55:53,247 ERROR [c.c.a.ApiAsyncJobDispatcher] >> (API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Unexpected >> exception while executing >> org.apache.cloudstack.api.command.admin.router.StartRouterCmd >> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is >> unreachable: Host 8: Unable to start instance due to Unable to start >> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart, >> not retrying >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120) >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) >> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) >> at >> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) >> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) >> at >> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) >> at >> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) >> at >> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) >> at >> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) >> at >> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: com.cloud.utils.exception.ExecutionException: Unable to >> start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in >> finalizeStart, not retrying >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) >> at >> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) >> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source) >> ... 17 more >> 2016-08-29 08:55:53,256 INFO [o.a.c.f.j.i.AsyncJobMonitor] >> (API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Remove >> job-2738 from job monitoring >> >> >> >> >> regards >> Martin >> >> >> >> >> On 08/26/2016 08:47 PM, Rohit Yadav wrote: >>> Hi Martin, >>> >>> >>> I checked my openssl connect output and my management server on port 8250 >>> was indeed returning valid certificates. On further investigation, when >>> connections to management server (NioServer, handled by Link class that >>> reads /etc/cloudstack/management/cloudmanagementserver.keystore) are >>> received it uses a keystore file >>> (/etc/cloudstack/management/cloudmanagementserver.keystore). Please check >>> if that's valid, or backup your keystore file and do what Simon has >>> suggested. >>> >>> >>> >>> For reference, here the output from my 4.9.0 mgmt server running on Ubuntu >>> 14.04.4: >>> >>> >>> $ openssl s_client -tls1 -connect 192.168.1.11:8250 >>> CONNECTED(00000003) >>> depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User >>> verify error:num=18:self signed certificate >>> verify return:1 >>> depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User >>> verify return:1 >>> --- >>> Certificate chain >>> 0 s:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User >>> i:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User >>> --- >>> Server certificate >>> -----BEGIN CERTIFICATE----- >>> MIIDPzCCAiegAwIBAgIEFHHX2DANBgkqhkiG9w0BAQsFADBQMRAwDgYDVQQGEwdV >>> bmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYDVQQLEwdibHVlYm94MRgwFgYD >>> VQQDEw9DbG91ZHN0YWNrIFVzZXIwHhcNMTUwNTA2MTM0MjMzWhcNMjUwNTAzMTM0 >>> MjMzWjBQMRAwDgYDVQQGEwdVbmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYD >>> VQQLEwdibHVlYm94MRgwFgYDVQQDEw9DbG91ZHN0YWNrIFVzZXIwggEiMA0GCSqG >>> SIb3DQEBAQUAA4IBDwAwggEKAoIBAQC0wN8kfOJMzwlbrOnBj/jjvjjIwDVpYLtH >>> WoKkNB+rzzKEUxaYwaQxe6E3M536ZuqcaJBIqcYPwTIkWyulvuHuJpSQak4VbuDV >>> f7dqt5RacLFT0jUciqTvL5QDCrk0uNugKkWgEvNtokVGSBwLPVEcdcGWpku1EpeH >>> vMYmpOkcWgbC8Z9D7QTlVw6oEWbPAtKr+gDrXdOFnpPPI45rteatIIgKm1Q6JjZM >>> qrUKfqt7s8ts6ZgdAN2WmtieSsnUX1su9SJMYg2J8LK7UJeGqiNtE+g944GPqtnW >>> aDldjharq54e79ug2ktxw29I3ulpRD/vgxwZmcPJrePUKwY91KEjAgMBAAGjITAf >>> MB0GA1UdDgQWBBSUJvLY8RL/1fTVqj1rT8136da4yzANBgkqhkiG9w0BAQsFAAOC >>> AQEAgDdFIgLvOH/UgRp2nnFUVcMp+uchSLj8CbCkukJBrUwrmJHp3Os+H1ggk8Vt >>> j3conj06zJBNN/E0J8pcpagE1aR+l4R8WxF3g/Oc7bNyrUlkGSQ82vavg9sEkwHY >>> eQY/4wj8CprICs9JilgZ6keeWNWgAW1goLZSzGVwz5eE0lPuc2Dg3laR5RsuTxie >>> dgQhpbOx3UZun+dhuP5NUHc+KWyrNvSZNN8FruO602KWZwm0Hndl7RVbkNEd0kxq >>> FhFK4Scc2HBrKMUrPTzO1nGCgR1gA015C2MFfmjeW49VTi95WnY8DDG2euUYAtpl >>> lLVuNxJxq7eDJfP/M9kxSIKgrQ== >>> -----END CERTIFICATE----- >>> subject=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User >>> issuer=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User >>> --- >>> No client certificate CA names sent >>> Server Temp Key: ECDH, P-256, 256 bits >>> --- >>> SSL handshake has read 1325 bytes and written 331 bytes >>> --- >>> New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA >>> Server public key is 2048 bit >>> Secure Renegotiation IS supported >>> Compression: NONE >>> Expansion: NONE >>> No ALPN negotiated >>> SSL-Session: >>> Protocol : TLSv1 >>> Cipher : ECDHE-RSA-AES256-SHA >>> Session-ID: >>> 57C08E5D841D9B74AB533A0000BDBE4275424640B5B7D1985D438B275598436F >>> Session-ID-ctx: >>> Master-Key: >>> E1DE7D7464766DCECD82088561A7D928519CA20BA91FDF4AA38BB2E51AB7C27D3F1ABA5788ADED2AD49FE8597EC4A344 >>> Key-Arg : None >>> PSK identity: None >>> PSK identity hint: None >>> SRP username: None >>> Start Time: 1472237149 >>> Timeout : 7200 (sec) >>> Verify return code: 18 (self signed certificate) >>> --- >>> >>> >>> >>> >>> Regards. >>> >>> ________________________________ >>> From: martin kolly <martin.ko...@senselan.ch> >>> Sent: 26 August 2016 20:36:02 >>> To: dev@cloudstack.apache.org >>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 >>> >>> >>> good point, thanks Simon! with openssl we receive a response on port 8250. >>> >>> # telnet 10.100.12.10 8250 >>> Trying 10.100.12.10... >>> Connected to 10.100.12.10. >>> Escape character is '^]'. >>> Connection closed by foreign host. >>> >>> # nc -zv 10.100.12.10 8250 >>> Connection to 10.100.12.10 8250 port [tcp/*] succeeded! >>> >>> # openssl s_client -tls1 -connect 10.100.12.10:8250 >>> CONNECTED(00000003) >>> write:errno=104 >>> --- >>> no peer certificate available >>> --- >>> No client certificate CA names sent >>> --- >>> SSL handshake has read 0 bytes and written 0 bytes >>> --- >>> New, (NONE), Cipher is (NONE) >>> Secure Renegotiation IS NOT supported >>> Compression: NONE >>> Expansion: NONE >>> SSL-Session: >>> Protocol : TLSv1 >>> Cipher : 0000 >>> Session-ID: >>> Session-ID-ctx: >>> Master-Key: >>> Key-Arg : None >>> PSK identity: None >>> PSK identity hint: None >>> SRP username: None >>> Start Time: 1472223447 >>> Timeout : 7200 (sec) >>> Verify return code: 0 (ok) >>> --- >>> >>> >>> On 08/26/2016 04:49 PM, Simon Weller wrote: >>>> Martin, >>>> >>>> >>>> Are you able to actually telnet to 8250 from the host to the mgmt server? >>>> >>>> >>>> - Si >>>> >>>> >>>> ________________________________ >>>> From: martin kolly <martin.ko...@senselan.ch> >>>> Sent: Friday, August 26, 2016 9:41 AM >>>> To: dev@cloudstack.apache.org >>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 >>>> >>>> Hi Rohit >>>> >>>> We highly appreciate your efforts! Unfortunately it still does not work. >>>> - ulimit is increased on mgmt server >>>> - jar file replaced >>>> - we confirm that cloudstack-agent 4.9.0 is installed >>>> >>>> MGMT Server >>>> # wget >>>> https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar >>>> -O cloud-utils-4.9.0.jar.patch >>>> # md5sum cloud-utils-4.9.0.jar.patch >>>> c4496f42cc6741f562ac645c3a3d8a0c cloud-utils-4.9.0.jar.patch >>>> # md5sum >>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar >>>> c4496f42cc6741f562ac645c3a3d8a0c >>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar >>>> >>>> # ulimit -a >>>> core file size (blocks, -c) 0 >>>> data seg size (kbytes, -d) unlimited >>>> scheduling priority (-e) 0 >>>> file size (blocks, -f) unlimited >>>> pending signals (-i) 64109 >>>> max locked memory (kbytes, -l) 64 >>>> max memory size (kbytes, -m) unlimited >>>> open files (-n) 10240 >>>> pipe size (512 bytes, -p) 8 >>>> POSIX message queues (bytes, -q) 819200 >>>> real-time priority (-r) 0 >>>> stack size (kbytes, -s) 8192 >>>> cpu time (seconds, -t) unlimited >>>> max user processes (-u) 64109 >>>> virtual memory (kbytes, -v) unlimited >>>> file locks (-x) unlimited >>>> >>>> KVM Server >>>> # wget >>>> https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar >>>> -O cloud-utils-4.9.0.jar.patch >>>> # md5sum cloud-utils-4.9.0.jar.patch >>>> c4496f42cc6741f562ac645c3a3d8a0c cloud-utils-4.9.0.jar.patch >>>> # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar >>>> c4496f42cc6741f562ac645c3a3d8a0c >>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar >>>> >>>> # apt-cache policy cloudstack-agent >>>> cloudstack-agent: >>>> Installed: 4.9.0 >>>> Candidate: 4.9.0 >>>> Version table: >>>> *** 4.9.0 0 >>>> 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ >>>> ./ Packages >>>> 100 /var/lib/dpkg/status >>>> >>>> The logs are attached. By the way: the error message "Caught the Exception >>>> in VmIpFetchTask" was already there with 4.8 release. >>>> >>>> Thanks >>>> Martin >>>> >>>> On 08/25/2016 07:44 PM, Rohit Yadav wrote: >>>> >>>> Hi Martin, >>>> >>>> >>>> Thanks for sharing. Alright, I'm not sure what's causing issue but based >>>> on the logs seems like only KVM agents are having issues while connecting >>>> to mgmt server as I don't see any Nio related exceptions in the management >>>> server logs. >>>> >>>> >>>> I could not see the cloudstack-agent version in the logs, I'm assuming >>>> that they were all upgraded to 4.9.0, and there are no conflicting jars at >>>> /usr/share/cloudstack-agent/lib. >>>> >>>> >>>> First, can you make sure mgmt server has enough ulimit. I found that >>>> Ubuntu/Debian's init.d script don't override this while CentOS >>>> initd/systemd script sets ulimit. On your mgmt server, edit >>>> /etc/init.d/cloudstack-management and add ulimit -n 10240 just before the >>>> mgmt server is started in the 'state' section (for me it was at around >>>> line #147 where it logs a message that it's starting the >>>> cloudstack-management server). >>>> >>>> >>>> Next, if this still does not solve the issue -- I created a special >>>> cloud-utils.jar for you that you need to place on your mgmt server and on >>>> the KVM agents and restart the mgmt server. This will increase verbosity >>>> of the error while reduce the Nio polling loop timeout (from 100ms to >>>> 10ms). On KVM agents, the error from the logs is that during SSL handshake >>>> inbound connection/stream gets closed, and we want to know the exception >>>> message. Please get the jar from here: >>>> >>>> https://github.com/rhtyd/cloudstack/releases/tag/4.9.0-nioinbound and >>>> place them at: >>>> >>>> /usr/share/cloudstack-agent/lib/ (on kvm host) >>>> >>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/ (on mgmt >>>> server host) >>>> >>>> >>>> Let me know what worked for you, and if it still failed can you share the >>>> mgmt server and agent logs once again. Thanks. >>>> >>>> >>>> Regards. >>>> >>>> ________________________________ >>>> From: martin kolly >>>> <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> >>>> Sent: 25 August 2016 20:50:08 >>>> To: dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org> >>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 >>>> >>>> Hi Rohit >>>> >>>> We are running java version 1.7.0.111 on KVM and management server. >>>> mgmt# java -version >>>> java version "1.7.0_111" >>>> kvm# java -version >>>> java version "1.7.0_111" >>>> >>>> We get the same error message. Attached are the logs with TRACE enabled. >>>> >>>> "success consists of going from failure to failure without loss of >>>> enthusiasm." >>>> >>>> regards >>>> martin >>>> >>>> On 08/25/2016 02:02 PM, Rohit Yadav wrote: >>>> >>>> Hi Martin, >>>> >>>> >>>> Thanks for sharing, on the surface there does not seem to be any issue in >>>> configuration causing the failures. I'm personally running KVM and Ubuntu >>>> hosts based env without issues, I'm on Ubuntu 14.04.4 (Linux bluebox >>>> 3.16.0-45-generic #60~14.04.1-Ubuntu) and java 1.7.0_79. Can you try >>>> upgrading your JRE7 to latest (openjdk-7-jre, >>>> 7u111-2.6.7-0ubuntu0.14.04.3) on all mgmt server and kvm hosts? >>>> >>>> >>>> If upgrading your JRE does not help, can you increase the logging >>>> verbosity for both the agent and management server (in >>>> /etc/cloudstack/{agent, management} there would be a log4j file, edit that >>>> and replace DEBUG/INFO with TRACE for class/keys com.cloud and >>>> org.apache.cloudstack) and re-share logs when the failures occur? I want >>>> to see what additional information we can get from logs when it tries to >>>> connect to host 10.100.12.10 on port: 8250. >>>> >>>> >>>> Regards. >>>> >>>> ________________________________ >>>> From: martin kolly >>>> <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> >>>> Sent: 25 August 2016 17:11:06 >>>> To: >>>> dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org> >>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 >>>> >>>> >>>> @Simon: We have one management server with local DB. KVMs connect >>>> directly to the management server without any security/loadbalancing >>>> device. >>>> >>>> Thanks >>>> Martin >>>> >>>> On 08/25/2016 12:41 PM, Simon Weller wrote: >>>> >>>> >>>> Martin, >>>> >>>> Can you provide more detail about your haproxy setup? >>>> Are you running it on separate servers, or on the management server itself? >>>> >>>> - Si >>>> >>>> Simon Weller/ENA >>>> (615) 312-6068 >>>> >>>> rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com> >>>> www.shapeblue.com<http://www.shapeblue.com> >>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>>> @shapeblue >>>> >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: martin kolly >>>> [martin.ko...@senselan.ch<mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch>] >>>> Received: Thursday, 25 Aug 2016, 5:04AM >>>> To: Rohit Yadav >>>> [rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com>]; >>>> >>>> dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org> >>>> >>>> [dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>] >>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601 >>>> >>>> >>>> thanks for your reply. >>>> >>>> This morning we repeated the upgrade process from 4.8 to 4.9 with the >>>> following repository: >>>> http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/. >>>> <http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/> >>>> Unfortunately we run into the same issue: >>>> >>>> /2016-08-25 09:49:00,660 INFO [utils.nio.NioClient] (main:null) >>>> (logid:) Connecting to 10.100.12.10:8250// >>>> //2016-08-25 09:49:00,668 WARN [utils.nio.Link] (main:null) (logid:) >>>> This SSL engine was forced to close inbound due to end of stream.// >>>> //2016-08-25 09:49:00,668 ERROR [utils.nio.NioClient] (main:null) >>>> (logid:) SSL Handshake failed while connecting to host: 10.100.12.10 >>>> port: 8250// >>>> //2016-08-25 09:49:00,668 ERROR [utils.nio.NioConnection] (main:null) >>>> (logid:) Unable to initialize the threads.// >>>> //java.io.IOException: SSL Handshake failed while connecting to host: >>>> 10.100.12.10 port: 8250// >>>> // at com.cloud.utils.nio.NioClient.init(NioClient.java:67)// >>>> // at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)// >>>> // at com.cloud.agent.Agent.start(Agent.java:237)// >>>> // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)// >>>> // at >>>> com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)// >>>> // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)// >>>> // at com.cloud.agent.AgentShell.start(AgentShell.java:456)// >>>> // at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)// >>>> // at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)// >>>> // at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)// >>>> // at java.lang.reflect.Method.invoke(Method.java:606)// >>>> // at >>>> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)// >>>> //2016-08-25 09:49:00,669 INFO [utils.exception.CSExceptionErrorCode] >>>> (main:null) (logid:) Could not find exception: >>>> com.cloud.utils.exception.NioConnectionException in error code list for >>>> exceptions// >>>> //2016-08-25 09:49:00,669 WARN [cloud.agent.Agent] (main:null) (logid:) >>>> NIO Connection Exception >>>> com.cloud.utils.exception.NioConnectionException: SSL Handshake failed >>>> while connecting to host: 10.100.12.10 port: 8250// >>>> //2016-08-25 09:49:00,670 INFO [cloud.agent.Agent] (main:null) (logid:) >>>> Attempted to connect to the server, but received an unexpected >>>> exception, trying again.../ >>>> >>>> *KVM Hosts: >>>> */# java -version >>>> java version "1.7.0_95" >>>> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1) >>>> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode) >>>> >>>> # dpkg --get-selections | grep -e 'jdk' -e 'java' >>>> ca-certificates-java install >>>> java-common install >>>> libcommons-daemon-java install >>>> openjdk-7-jre-headless:amd64 install >>>> tzdata-java install >>>> >>>> # apt-cache policy cloudstack-agent >>>> cloudstack-agent: >>>> Installed: 4.9.0 >>>> Candidate: 4.9.0 >>>> Version table: >>>> *** 4.9.0 0 >>>> 500 >>>> http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages >>>> 100 /var/lib/dpkg/status >>>> >>>> # find /usr/share/ -name "cloud-utils*.jar" >>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar >>>> # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar >>>> a8de7306d7c80b5a73e93b83afdd119f >>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar >>>> >>>> >>>> /*Management Server: >>>> */# java -version// >>>> //java version "1.7.0_95"// >>>> //OpenJDK Runtime Environment (IcedTea 2.6.4) >>>> (7u95-2.6.4-0ubuntu0.14.04.1)// >>>> //OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)// >>>> // >>>> //# dpkg --get-selections | grep -e 'jdk' -e 'java'// >>>> //ca-certificates-java install// >>>> //java-common install// >>>> //libcommons-collections3-java install// >>>> //libcommons-daemon-java install// >>>> //libcommons-dbcp-java install// >>>> //libcommons-pool-java install// >>>> //libecj-java install// >>>> //libgeronimo-jta-1.1-spec-java install// >>>> //libmysql-java install// >>>> //libservlet2.5-java install// >>>> //libtomcat6-java install// >>>> //openjdk-7-jre-headless:amd64 install// >>>> //tzdata-java install// >>>> // >>>> //# apt-cache policy cloudstack-management// >>>> //cloudstack-management:// >>>> // Installed: 4.9.0// >>>> // Candidate: 4.9.0// >>>> // Version table:// >>>> // *** 4.9.0 0// >>>> // 500 >>>> http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages// >>>> // 100 /var/lib/dpkg/status/// >>>> >>>> /# find /usr/share/ -name "cloud-utils*.jar"// >>>> ///usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// >>>> ///usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// >>>> ///usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar// >>>> //# md5sum >>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// >>>> //a8de7306d7c80b5a73e93b83afdd119f >>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar// >>>> //# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// >>>> //a8de7306d7c80b5a73e93b83afdd119f >>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar// >>>> //# md5sum /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar// >>>> //a8de7306d7c80b5a73e93b83afdd119f >>>> /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar/ >>>> >>>> The classpath.conf was not modified: >>>> /# cat /etc/cloudstack/management/classpath.conf >>>> #!/bin/bash >>>> #... >>>> >>>> SYSTEMJARS="" >>>> SCP=$(build-classpath $SYSTEMJARS 2>/dev/null) ; if [ $? != 0 ] ; then >>>> export SCP="" ; fi >>>> MCP="" >>>> DCP="/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar" >>>> CLASSPATH=$SCP:$DCP:$MCP:/etc/cloudstack/management:/usr/share/cloudstack-management/setup >>>> for jarfile in ""/* ; do >>>> if [ ! -e "$jarfile" ] ; then continue ; fi >>>> CLASSPATH=$jarfile:$CLASSPATH >>>> done >>>> for plugin in ""/* ; do >>>> if [ ! -e "$plugin" ] ; then continue ; fi >>>> CLASSPATH=$plugin:$CLASSPATH >>>> done >>>> for vendorconf in "/etc/cloudstack/management"/vendor/* ; do >>>> if [ ! -d "$vendorconf" ] ; then continue ; fi >>>> CLASSPATH=$vendorconf:$CLASSPATH >>>> done >>>> export CLASSPATH >>>> if ([ -z "$JAVA_HOME" ] || [ ! -d "$JAVA_HOME" ]) && [ -d >>>> /usr/lib/jvm/jre-1.7.0 ]; then >>>> export JAVA_HOME=/usr/lib/jvm/jre-1.7.0 >>>> fi >>>> PATH=$JAVA_HOME/bin:/sbin:/usr/sbin:$PATH >>>> export PATH/ >>>> >>>> Regards >>>> Martin >>>> >>>> On 08/24/2016 06:56 PM, Rohit Yadav wrote: >>>> >>>> >>>> Martin, >>>> >>>> >>>> Were you able to fix your issue after installing packages from the >>>> repo Will shared and restarting the services? >>>> >>>> I've not personally tested the apt-get.eu repo, but I had earlier >>>> built this repo which I'm personally using in my local KVM-trusty >>>> based cloud: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ >>>> >>>> >>>> If you're still getting the error, can you share the JRE version >>>> you're running, both on the mgmt server and on the KVM hosts? You can >>>> run java -version, or share output of "dpkg --get-selections | grep -e >>>> 'jdk' -e 'java'". Are you running CloudStack with any additional plugins? >>>> >>>> From the logs, looks like there are mixed jar files, >>>> NioConnectionException class was not found -- something's wrong with >>>> your installation. there must be a cloud-utils jar file make sure your >>>> installation don't have multiple copies/versions of jars >>>> (somewhere) in the in /usr/share/cloudstack-common and in >>>> /usr/share/cloudstack-management/webapps/client/ paths: >>>> >>>> Could not find exception: >>>> com.cloud.utils.exception.NioConnectionException in error code list for >>>> exceptions >>>> The error "Unable to initialize the threads." suggests, JVM was not >>>> able to spawn threads. I would like to know your JRE version and any >>>> other settings configured in /etc/cloudstack/management/classpath.conf >>>> (and there are bunch of other files where JAVA_OPTS might have been >>>> overridden). Note: For now you should only be using JRE1.7. >>>> >>>> >>>> Regards. >>>> >>>> rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> >>>> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> >>>> @shapeblue >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> *From:* martin kolly >>>> <martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch><mailto:martin.ko...@senselan.ch> >>>> *Sent:* 24 August 2016 19:53:26 >>>> *To:* >>>> dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>; >>>> Rohit Yadav >>>> *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601 >>>> >>>> Thanks Will! >>>> >>>> yes the repo is pointing to 4.9 release for all KVMs and for the >>>> management server: >>>> /cloudstack:~# cat /etc/apt/sources.list.d/cloudstack.list // >>>> //deb http://cloudstack.apt-get.eu/ubuntu trusty 4.9/ >>>> >>>> All KVM agents and the mgmt server are upgraded to release 4.9 based >>>> on the documentation.We have restarted all the cloudstack-agents and >>>> the cloudstack-management service as well. >>>> >>>> Network traces are showing packets from KVM <-> Mgmt on port 8250. >>>> there is no security device in between. >>>> >>>> thanks >>>> fanfarlo >>>> >>>> >>>> >>>> >>>> On 08/24/2016 04:13 PM, Will Stevens wrote: >>>> >>>> >>>> @rohit, I am guessing they should be installing the cloudstack-agent using >>>> the following repo right? That is what is described in the upgrade (trusty >>>> instead of precise though). >>>> >>>> http://cloudstack.apt-get.eu/ubuntu/dists/trusty/4.9/ >>>> >>>> @fanfarlo, are your repo's setup to point to the new 4.9 version? >>>> >>>> cheers, >>>> >>>> will >>>> >>>> On Wed, Aug 24, 2016 at 9:46 AM, Rohit Yadav >>>> <rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> >>>> wrote: >>>> >>>> >>>> >>>> The PR and fix already exists in 4.9.0 release. Please make sure to >>>> upgrade all of your management server(s) and KVM agents and then also >>>> restart them after the upgrade. >>>> >>>> >>>> If you are seeing SSL handshake failures, it could be due to network or >>>> security issue and most likely due to mismatch between CloudStack mgmt >>>> server and KVM agent version. >>>> >>>> >>>> Regards. >>>> >>>> rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> >>>> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> >>>> @shapeblue >>>> >>>> >>>> >>>> ------------------------------ >>>> *From:* Will Stevens >>>> <williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com><mailto:williamstev...@gmail.com> >>>> *Sent:* 24 August 2016 18:17:17 >>>> *To:* >>>> dev@cloudstack.apache.org<mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org><mailto:dev@cloudstack.apache.org>; >>>> Rohit Yadav >>>> *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601 >>>> >>>> >>>> That PR is already merged, so you don't have to do anything to get that >>>> code, you already have it. >>>> >>>> @rohit, can you review this? I think this is a similar to the issue Simon >>>> reported earlier. >>>> >>>> Will >>>> >>>> On Aug 24, 2016 6:56 AM, "fanfarlo" >>>> <fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com><mailto:fanfar...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> hi all >>>> >>>> We have the following environment: >>>> - OS: Debian 14.04 (hypervisors and management) >>>> - 4 KVM Hosts >>>> - Cloudstack Release 4.9 with local database >>>> >>>> Since we upgraded to Release 4.9 the KVM hosts no longer connect to the >>>> management Server. Upgrade procedure was followed as described: >>>> http://docs.cloudstack.apache.org/projects/cloudstack-releas >>>> e-notes/en/4.9.0/upgrade/upgrade-4.8.html >>>> >>>> >>>> On the KVM hosts we have the following error message: >>>> /2016-08-24 10:42:49,678 INFO [utils.exception.CSExceptionErrorCode] >>>> (main:null) (logid:) Could not find exception: >>>> com.cloud.utils.exception.NioConnectionException in error code list for >>>> exceptions >>>> 2016-08-24 10:42:49,678 WARN [cloud.agent.Agent] (main:null) (logid:) >>>> NIO Connection Exception >>>> com.cloud.utils.exception.NioConnectionException: SSL Handshake failed >>>> while connecting to host: 10.100.12.10 port: 8250 >>>> 2016-08-24 10:42:49,678 INFO [cloud.agent.Agent] (main:null) (logid:) >>>> Attempted to connect to the server, but received an unexpected >>>> exception, trying again... >>>> 2016-08-24 10:42:54,679 INFO [utils.nio.NioClient] (main:null) (logid:) >>>> Connecting to 10.100.12.10:8250 >>>> 2016-08-24 10:42:54,684 WARN [utils.nio.Link] (main:null) (logid:) This >>>> SSL engine was forced to close inbound due to end of stream. >>>> 2016-08-24 10:42:54,684 ERROR [utils.nio.NioClient] (main:null) (logid:) >>>> SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250 >>>> 2016-08-24 10:42:54,685 ERROR [utils.nio.NioConnection] (main:null) >>>> (logid:) Unable to initialize the threads. >>>> java.io.IOException: SSL Handshake failed while connecting to host: >>>> 10.100.12.10 port: 8250 >>>> at com.cloud.utils.nio.NioClient.init(NioClient.java:67) >>>> at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88) >>>> at com.cloud.agent.Agent.start(Agent.java:237) >>>> at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399) >>>> at >>>> com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367) >>>> at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351) >>>> at com.cloud.agent.AgentShell.start(AgentShell.java:456) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>> ssorImpl.java:57) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>> thodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>> at >>>> org.apache.commons.daemon.support.DaemonLoader.start(DaemonL >>>> oader.java:243) >>>> 2016-08-24 10:42:54,685 INFO [utils.exception.CSExceptionErrorCode] >>>> (main:null) (logid:) Could not find exception: >>>> com.cloud.utils.exception.NioConnectionException in error code list for >>>> exceptions >>>> 2016-08-24 10:42:54,685 WARN [cloud.agent.Agent] (main:null) (logid:) >>>> NIO Connection Exception >>>> com.cloud.utils.exception.NioConnectionException: SSL Handshake failed >>>> while connecting to host: 10.100.12.10 port: 8250 >>>> 2016-08-24 10:42:54,686 INFO [cloud.agent.Agent] (main:null) (logid:) >>>> Attempted to connect to the server, but received an unexpected >>>> exception, trying again.../ >>>> >>>> >>>> Port is open on the management server, there is no firewall in between. >>>> We found that there was a bug report here: >>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9348. There is a PR >>>> changing the NIO Selector wait time: >>>> https://github.com/apache/cloudstack/pull/1601 which was merged into >>>> master branch. >>>> >>>> Since we installed Release 4.9 we probably need to patch the >>>> NioConection.class as described in PR1601 , right? >>>> >>>> kvm03# unzip -v /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar | >>>> grep NioConnection >>>> 3923 Defl:N 1778 55% 2016-08-02 09:28 05aaf7d5 >>>> com/cloud/utils/nio/NioConnection$1.class >>>> 881 Defl:N 495 44% 2016-08-02 09:28 e378984c >>>> com/cloud/utils/nio/NioConnection$ChangeRequest.class >>>> 15410 Defl:N 7130 54% 2016-08-02 09:28 b3281f5a >>>> com/cloud/utils/nio/NioConnection.class >>>> 1134 Defl:N 584 49% 2016-08-02 09:28 8d5cb4a8 >>>> com/cloud/utils/exception/NioConnectionException.class >>>> >>>> Due to a lack of java expertise we have some basic questions: >>>> - Is there a patched jar file available ? public build server? >>>> - Do we need to create the jar from sources ? procedure? >>>> - How do we apply the patch ? >>>> >>>> many thanks! >>>> fanfarlo >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> rohit.ya...@shapeblue.com<mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com><mailto:rohit.ya...@shapeblue.com> >>>> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com> >>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>>> @shapeblue >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> rohit.ya...@shapeblue.com >>> www.shapeblue.com<http://www.shapeblue.com> >>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK >>> @shapeblue >>> >>> >>> >>> > > rohit.ya...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > >