Hi Rohit
Finally we had some time to dig deeper into this issue. The /bin/bash
issue was a typo on our side, sorry!
System VM was destroyed/recreated several times.
But something with the ssh keys is still wrong. The KVM is still asking
for the pass phrase when ssh into system vms.
*# Management Server*
mysql> select * from configuration where name = "ssh.publickey" \G;
*************************** 1. row ***************************
category: Hidden
instance: DEFAULT
component: management-server
name: ssh.publickey
value:
XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a
description: Public key for the entire CloudStack
default_value: NULL
updated: NULL
scope: NULL
is_dynamic: 0
1 row in set (0.00 sec)
mysql> select * from configuration where name = "ssh.privatekey" \G;
*************************** 1. row ***************************
category: Hidden
instance: DEFAULT
component: management-server
name: ssh.privatekey
value:
j+Z5ckPXUq7lwo2L4OYaaiDOA/fdUpadX3h7E/DA1rwPRSz967eckCbRRpCtCWxfQjtAgPNfLgGIQHX4K7o9yF9DBXG0ESXoECyrRAOTmfZdvdSZceRGd9TYcjwCAn4s8GKGQRKeQSQLL8Pusc/PmYLISIomkOHhZB3lp1bfniKC6uf6RwhRFNfX38ZeqVhRNgagK2hxOo67Ev7uUX1NIUWw/ROq/RC7bG58DD1uhUOX5Bm5wVMSmf3Pv7s/z2t7ZGcX8YMnW
.....
....
....
....
/uPooWhUQpcG8Q8APvfEKMp0BVJATh4Gk4qkgnR4CHyefUYCDHmWZnpuCj1ra7Kn5LV7AdWG1+C7bl7hSmpxE8StLfEwe72zuPJsI86LekxhJJpsorjvSFYklJhGizSs4uuyRP2Dqayx1qFkFJe26AiWb86ld2gQ2Vwj3ozgHX2OMp60jQM8ZVM9VuroOEmiOZ5gkfhFFIf5Y78WcZU2ScSCAHVavNAIsfX+i01iNxwnXtrmrA9ctlZGuZZRR3TklAXQz4M69SFFYuo5e7OsJ2/Tr9K7AhUHA8kwR+BxPLFGRlJpyhmLUuPartDKRBAp2BwU1XsFiIs8Y6pn9wkRJaeeDdpUt9zPCYMHaHtXQvFMe+d96J4HMPvfLgGnd55Hz7GKsglX9su7XpuAqzkuPM5u6GY5LLAtXUOqE2Wm855sLRW1kbHnX8pAfIe9wQbfUQ9wsfCh66w1duce0vyuj4A7A8HC4V8+U2qXjwMIhLpfO5phfUy4drlUzBzx9ZMK+JUYP6kszj61M09psoYggUBXMYCfg6oWJSjA8=
description: Private key for the entire CloudStack
default_value: NULL
updated: NULL
scope: NULL
is_dynamic: 0
1 row in set (0.00 sec)
*# Secondary Storage VM (note that the key is equal to the management
server database entry ssh.publickey)*
root@s-252-VM:~# cat /root/.ssh/authorized_keys
XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5a
root@s-252-VM:~# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 0e:00:a9:fe:00:18
inet addr:169.254.0.24 Bcast:169.254.255.255 Mask:255.255.0.0
*#KVM Hosts*
root@kvm704:~/.ssh# cat /root/.ssh/id_rsa.pub.cloud
XiVYy4PpuS4nKoQZRsj+3VQr0dxXFt+s15ZVXiIbaacJ5RsLwlSEO75kbPaL7pDAhyZbcCtC3ICSDUAeJysZ958cTVrl390Lk0UtRoLOUKlNIuOEVzdJpZDmsrm90ZK+OYMaHKf2cVz3xekZv/BS5Kpr2dnieUNZlQlgCXgldwENNZ7+MSlPRLiJ+7/SEBtfn9VTUkXvedhbBtIh3fxexOR/xN+zGRTyotlYANxiypklrlZp6/364JfXoeQviE6aVzsVDboALW+ov6gFj8ziO7vhoUN9M6rJohGnPawjgMCBasVCjWqoFaezKlf/0vEDf3NkKOoi1kqvSDGOumCFyQ7+9BOBq8qdqnP3i4e4aJ3vLdVWlRKRaUps08xOP8+j36L9lrO0YAkGXH+bAQXzQMqIFLvjRGX+zOnEikRQ2Zv0JJSlAvftF13BQrTt1SV/ttDa0qo7fzqF1WUijZb/PcKLGUeC30U9kG4JLmrHcuaoe43I08qkGJEofUetZGQLIYhsUqXiVSflGEGfJTWJgMGKAVbGUF5aroot@kvm704
root@kvm704:~/.ssh# ssh -p 3922 -q -o StrictHostKeyChecking=no -i
/root/.ssh/id_rsa.cloud [email protected]
Enter passphrase for key '/root/.ssh/id_rsa.cloud':
*# Router which was not redeployed *
root@r-191-VM:~# cat .ssh/authorized_keys
ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQDoZcfuh/R2i/n69hFBJEfKKtTQQAs7iOIC9PxFUWrFONk9m4R8Atj8o5wLyHflhe10iMHUJTGH84/J4min1r7zU4xGqzOBR3CKuz6dowmC+IQ38iyJfTOC9L1eIl8AebmKNwSSgjS6nc6sK13iFpgr8mSIr7ExcXPxYgWNgbQAm/HvLVc8Ivf2FZR8f26i3EyjsLVKpHoyvVMeVdoAHBdQB/NiUslAz2fz3cPXxVSxpZobd7iy/uJNVK1CHvqhGYLrXym9cjfM0BOUhN80NN9SvAiyj9D7PHiwcDBzMSSPljxwF6rPY7BXHq+7AVvzVPTJYEuBVQllYn1RrjCS/Fdr
cloud@cloudstack
If we understand correctly the public and private key are encrypted in
the management database. Based on the Source Code
master# ag ssh.publickey
server/src/com/cloud/server/ConfigurationServerImpl.java
761: "VALUES ('Hidden','DEFAULT',
'management-server','ssh.publickey', '"
+*DBEncryptionUtil.encrypt(publicKey)* +
855: String pubKey = _configDao.getValue("ssh.publickey");
- It looks like the rsa key is still encrypted on the KVM Server and on
the System VMs?
- Could this issue be related with the keystore cleanup?
- We are running out of ideas....arrr
regards
Martin
On 08/31/2016 07:31 PM, Rohit Yadav wrote:
> Hi Martin,
>
>
> Looks like something went wrong with the CloudStack generated public/private
> keys. You should not be prompted for any password when trying to ssh to
> systemvms on port 3922. Can you try to destroy the VRs or create a new
> network which would create a new VR.
>
>
> Regarding the /bin/bash not found issue, that looks fishy. /bin/bash should
> be available on both Ubuntu 14.04 based KVM hosts and also on systemvms
> (which are Debian7 based).
>
>
> Regards.
>
> ________________________________
> From: martin kolly <[email protected]>
> Sent: 29 August 2016 20:10:32
> To: [email protected]
> Subject: Re: CS 4.9 NIO Selector wait time PR-1601
>
>
> We have done more investigations. An error happens when the script
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh is
> executed:
>
> MGMT LOGS:
> 2016-08-29 16:00:11,342 DEBUG [c.c.a.t.Request]
> (AgentManager-Handler-12:null) (logid:) Seq 8-1837187172990452475:
> Processing: { Ans: , MgmtId: 90520741415395, via: 8, Ver: v1, Flags:
> 10,
> [{"com.cloud.agent.api.GetRouterAlertsAnswer":{"result":false,"details":"/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
> 1: /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
> h!/bin/bash: not found","wait":0}}] }
> 2016-08-29 16:00:11,345 DEBUG [c.c.a.t.Request]
> (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Seq
> 8-1837187172990452475: Received: { Ans: , MgmtId: 90520741415395, via:
> 8(kvm702), Ver: v1, Flags: 10, { GetRouterAlertsAnswer } }
> *2016-08-29 16:00:11,345 DEBUG [c.c.a.m.AgentManagerImpl]
> (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Details from
> executing class com.cloud.agent.api.routing.GetRouterAlertsCommand:
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1:
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
> h!/bin/bash: not found*
> 2016-08-29 16:00:11,345 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterStatusMonitor-1:ctx-9cdc1e33) (logid:5f04bf82) Unable to get
> alerts from router r-221-VM
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh: 1:
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh:
> h!/bin/bash: not found
> 2016-08-29 16:00:11,411 INFO [o.a.c.e.o.NetworkOrchestrator]
> (Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104)
> NetworkGarbageCollector uses '600' seconds for GC interval.
> 2016-08-29 16:00:11,416 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Network-Scavenger-1:ctx-64537bea) (logid:0c5d8104) We found network 219
> to be free for the first time. Adding it to the list: 1472479211
> 2016-08-29 16:00:12,392 DEBUG [c.c.s.StatsCollector]
> (StatsCollector-6:ctx-c1aeb528) (logid:a364cc05) StorageCollector is
> running...
> 2016-08-29 16:00:12,403 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru]
> (StatsCollector-6:ctx-c1aeb528) (logid:a364cc05)
> getCommandHostDelegation: class com.cloud.agent.api.GetStorageStatsCommand
>
> For debugging we modified the "router_proxy.sh" slightly on the KVM Host:
>
> kvm# grep -A 2 DEBUG
> /usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh -n
> 46:# DEBUG CS Upgrade
> *47-echo "$(date): ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert
> root@$domRIp /opt/cloud/bin/$script $*" >> /tmp/router.txt*
> 48-ssh -p 3922 -q -o StrictHostKeyChecking=no -i $cert root@$domRIp
> "/opt/cloud/bin/$script $*"
>
> kvm# tail /tmp/router.txt
> Mon Aug 29 16:10:11 CEST 2016: ssh -p 3922 -q -o
> StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud [email protected]
> /opt/cloud/bin/netusage.sh -g
> Mon Aug 29 16:10:11 CEST 2016: ssh -p 3922 -q -o
> StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud [email protected]
> /opt/cloud/bin/netusage.sh -g
>
> On the router with IP 169.254.2.67 we see the failed logins as well:
>
> root@r-221-VM:~# tail -n 2 /var/log/auth.log
> Aug 29 14:10:11 r-221-VM sshd[7074]: Connection closed by 169.254.0.1
> [preauth]
> Aug 29 14:10:11 r-221-VM sshd[7075]: Connection closed by 169.254.0.1
> [preauth]
>
> These commands did not succeed. If we execute the command manually we
> get prompted for the key passphrase.
>
> kvm# ssh -p 3922 -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa.cloud
> [email protected]
> Enter passphrase for key '/root/.ssh/id_rsa.cloud':
>
> can someone confirm that the rsa key is protected with a passhrase? this
> looks suspicious to us...
>
> regards
> martin
>
>
>
>
>
>
>
> On 08/29/2016 10:09 AM, martin kolly wrote:
>> thanks Simon and Rohit for the valuable inputs! After applying the
>> following procedure the ssl errors are gone and host state is UP.
>>
>> # service cloudstack-management stop
>>
>> mysql> delete from configuration where name = "ssl.keystore" ;
>> # mv /etc/cloudstack/management/cloudmanagementserver.keystore
>> /etc/cloudstack/management/cloudmanagementserver.keystore.old
>>
>> # service cloudstack-management start
>>
>> # file /etc/cloudstack/management/cloudmanagementserver.keystore
>> /etc/cloudstack/management/cloudmanagementserver.keystore: Java KeyStore
>> # file /etc/cloudstack/management/cloudmanagementserver.keystore.old
>> /etc/cloudstack/management/cloudmanagementserver.keystore.old: data
>>
>> #java -version
>> java version "1.7.0_111"
>>
>> However new routers are not deployed, we still see entries that KVM
>> hosts are unreachable (logs underneath).
>>
>> After starting a new router it is shown with "virsh list" on the KVM:
>> kvm# virsh list | grep r-233-VM
>> 111 r-233-VM running
>>
>> After some seconds the router is deleted,
>>
>> kvm# virsh list | grep r-233-VM
>> #
>>
>> kvm#virsh -v
>> 1.2.2
>>
>> (libvirt was already restarted)
>>
>>
>>
>> ############ Logs/Info KVM #################
>>
>> kvm # dpkg -l | grep cloudstack
>> ii cloudstack-agent
>> 4.9.0 all CloudStack agent
>> ii cloudstack-common
>> 4.9.0 all A common package
>> which contains files which are shared by several CloudStack packages
>>
>> kvm# df -kh | grep cloud
>> 10.100.12.9:/export/cloud 188G 137G 51G 73%
>> /mnt/5db02c19-1e8f-3591-bdb4-02608362521e
>>
>> kvm# tail -f /var/log/cloudstack/agent/agent.log
>> ...
>> 2016-08-29 06:55:06.203+0000: 4425: error :
>> qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine
>> balloon device path
>> 2016-08-29 06:55:06.223+0000: 4426: error :
>> qemuMonitorFindBalloonObjectPath:1032 : internal error: Cannot determine
>> balloon device path
>> 2016-08-29 06:55:24.062+0000: 4425: warning : qemuDomainObjTaint:1628 :
>> Domain id=112 name='r-233-VM' uuid=d9dcd37a-242d-43ac-a18e-79a4bfa86ebb
>> is tainted: high-privileges
>> 2016-08-29 06:55:49.066+0000: 4423: error : qemuMonitorIO:656 : internal
>> error: End of file from monitor
>>
>>
>>
>>
>> ############ Logs/Info Management Server ########
>>
>> # dpkg -l | grep cloudstack
>> ii cloudstack-agent
>> 4.9.0 all CloudStack agent
>> ii cloudstack-common
>> 4.9.0 all A common package
>> which contains files which are shared by several CloudStack packages
>> ii cloudstack-management
>> 4.9.0 all CloudStack server library
>> ii cloudstack-usage
>> 4.9.0 all CloudStack usage monitor
>>
>> # mysql -u root cloud -e "select id,name,path from cloud.storage_pool
>> where pool_type='Filesystem'"
>> +----+----------------------+-------------------------+
>> | id | name | path |
>> +----+----------------------+-------------------------+
>> | 1 | kvm704 Local Storage | /var/lib/libvirt/images |
>> | 2 | kvm701 Local Storage | /var/lib/libvirt/images |
>> | 3 | kvm702 Local Storage | /var/lib/libvirt/images |
>> | 4 | kvm703 Local Storage | /var/lib/libvirt/images |
>> +----+----------------------+-------------------------+
>>
>> #tail -f /var/log/cloudstack/management/management-server.log | grep -v
>> DEBUG
>> ...
>> 2016-08-29 08:55:42,484 WARN [o.a.c.alerts]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) alertType:: 9 // dataCenterId:: 1 // podId:: 1 //
>> clusterId:: null // message:: Command:
>> com.cloud.agent.api.GetDomRVersionCommand failed while starting virtual
>> router
>> 2016-08-29 08:55:42,494 ERROR
>> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) GetDomRVersionCmd failed
>> 2016-08-29 08:55:42,495 WARN
>> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) Command: com.cloud.agent.api.GetDomRVersionCommand
>> failed while starting virtual router
>> 2016-08-29 08:55:42,495 INFO [c.c.v.VirtualMachineManagerImpl]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) The guru did not like the answers so stopping
>> VM[DomainRouter|r-233-VM]
>> 2016-08-29 08:55:49,184 INFO [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) Begin cleanup
>> expired async-jobs
>> 2016-08-29 08:55:49,192 INFO [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (AsyncJobMgr-Heartbeat-1:ctx-5bd7c179) (logid:86c0376e) End cleanup
>> expired async-jobs
>> 2016-08-29 08:55:50,861 ERROR [c.c.v.VirtualMachineManagerImpl]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) Failed to start instance VM[DomainRouter|r-233-VM]
>> com.cloud.utils.exception.ExecutionException: Unable to start
>> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
>> not retrying
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
>> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
>> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
>> at
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
>> at
>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>> at
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>> at
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 2016-08-29 08:55:51,136 ERROR [c.c.v.VmWorkJobHandlerProxy]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) Invocation exception, caused by:
>> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
>> unreachable: Host 8: Unable to start instance due to Unable to start
>> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
>> not retrying
>> 2016-08-29 08:55:51,136 INFO [c.c.v.VmWorkJobHandlerProxy]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739 ctx-68467c7a)
>> (logid:337b354e) Rethrow exception
>> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
>> unreachable: Host 8: Unable to start instance due to Unable to start
>> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
>> not retrying
>> 2016-08-29 08:55:51,137 ERROR [c.c.v.VmWorkJobDispatcher]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e)
>> Unable to complete AsyncJobVO {id:2739, userId: 2, accountId: 2,
>> instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
>> cmdInfo:
>> rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAA6XQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw,
>> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>> result: null, initMsid: 90520741415395, completeMsid: null, lastUpdated:
>> null, lastPolled: null, created: Mon Aug 29 08:55:21 CEST 2016}, job
>> origin:2738
>> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
>> unreachable: Host 8: Unable to start instance due to Unable to start
>> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
>> not retrying
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
>> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
>> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
>> at
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
>> at
>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>> at
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>> at
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>> start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in
>> finalizeStart, not retrying
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
>> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
>> ... 17 more
>> 2016-08-29 08:55:51,168 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (Work-Job-Executor-112:ctx-b46df979 job-2738/job-2739) (logid:337b354e)
>> Remove job-2739 from job monitoring
>> 2016-08-29 08:55:53,197 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:db064d99)
>> Add job-2740 into job monitoring
>> 2016-08-29 08:55:53,218 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (Work-Job-Executor-113:ctx-6411b64b job-2738/job-2740) (logid:337b354e)
>> Remove job-2740 from job monitoring
>> 2016-08-29 08:55:53,247 ERROR [c.c.a.ApiAsyncJobDispatcher]
>> (API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Unexpected
>> exception while executing
>> org.apache.cloudstack.api.command.admin.router.StartRouterCmd
>> com.cloud.exception.AgentUnavailableException: Resource [Host:8] is
>> unreachable: Host 8: Unable to start instance due to Unable to start
>> VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in finalizeStart,
>> not retrying
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
>> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753)
>> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
>> at
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
>> at
>> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>> at
>> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>> at
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>> at
>> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: com.cloud.utils.exception.ExecutionException: Unable to
>> start VM:d9dcd37a-242d-43ac-a18e-79a4bfa86ebb due to error in
>> finalizeStart, not retrying
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084)
>> at
>> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592)
>> at sun.reflect.GeneratedMethodAccessor290.invoke(Unknown Source)
>> ... 17 more
>> 2016-08-29 08:55:53,256 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (API-Job-Executor-68:ctx-d2eb7557 job-2738) (logid:337b354e) Remove
>> job-2738 from job monitoring
>>
>>
>>
>>
>> regards
>> Martin
>>
>>
>>
>>
>> On 08/26/2016 08:47 PM, Rohit Yadav wrote:
>>> Hi Martin,
>>>
>>>
>>> I checked my openssl connect output and my management server on port 8250
>>> was indeed returning valid certificates. On further investigation, when
>>> connections to management server (NioServer, handled by Link class that
>>> reads /etc/cloudstack/management/cloudmanagementserver.keystore) are
>>> received it uses a keystore file
>>> (/etc/cloudstack/management/cloudmanagementserver.keystore). Please check
>>> if that's valid, or backup your keystore file and do what Simon has
>>> suggested.
>>>
>>>
>>>
>>> For reference, here the output from my 4.9.0 mgmt server running on Ubuntu
>>> 14.04.4:
>>>
>>>
>>> $ openssl s_client -tls1 -connect 192.168.1.11:8250
>>> CONNECTED(00000003)
>>> depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User
>>> verify error:num=18:self signed certificate
>>> verify return:1
>>> depth=0 C = Unknown, O = bluebox, OU = bluebox, CN = Cloudstack User
>>> verify return:1
>>> ---
>>> Certificate chain
>>> 0 s:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
>>> i:/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
>>> ---
>>> Server certificate
>>> -----BEGIN CERTIFICATE-----
>>> MIIDPzCCAiegAwIBAgIEFHHX2DANBgkqhkiG9w0BAQsFADBQMRAwDgYDVQQGEwdV
>>> bmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYDVQQLEwdibHVlYm94MRgwFgYD
>>> VQQDEw9DbG91ZHN0YWNrIFVzZXIwHhcNMTUwNTA2MTM0MjMzWhcNMjUwNTAzMTM0
>>> MjMzWjBQMRAwDgYDVQQGEwdVbmtub3duMRAwDgYDVQQKEwdibHVlYm94MRAwDgYD
>>> VQQLEwdibHVlYm94MRgwFgYDVQQDEw9DbG91ZHN0YWNrIFVzZXIwggEiMA0GCSqG
>>> SIb3DQEBAQUAA4IBDwAwggEKAoIBAQC0wN8kfOJMzwlbrOnBj/jjvjjIwDVpYLtH
>>> WoKkNB+rzzKEUxaYwaQxe6E3M536ZuqcaJBIqcYPwTIkWyulvuHuJpSQak4VbuDV
>>> f7dqt5RacLFT0jUciqTvL5QDCrk0uNugKkWgEvNtokVGSBwLPVEcdcGWpku1EpeH
>>> vMYmpOkcWgbC8Z9D7QTlVw6oEWbPAtKr+gDrXdOFnpPPI45rteatIIgKm1Q6JjZM
>>> qrUKfqt7s8ts6ZgdAN2WmtieSsnUX1su9SJMYg2J8LK7UJeGqiNtE+g944GPqtnW
>>> aDldjharq54e79ug2ktxw29I3ulpRD/vgxwZmcPJrePUKwY91KEjAgMBAAGjITAf
>>> MB0GA1UdDgQWBBSUJvLY8RL/1fTVqj1rT8136da4yzANBgkqhkiG9w0BAQsFAAOC
>>> AQEAgDdFIgLvOH/UgRp2nnFUVcMp+uchSLj8CbCkukJBrUwrmJHp3Os+H1ggk8Vt
>>> j3conj06zJBNN/E0J8pcpagE1aR+l4R8WxF3g/Oc7bNyrUlkGSQ82vavg9sEkwHY
>>> eQY/4wj8CprICs9JilgZ6keeWNWgAW1goLZSzGVwz5eE0lPuc2Dg3laR5RsuTxie
>>> dgQhpbOx3UZun+dhuP5NUHc+KWyrNvSZNN8FruO602KWZwm0Hndl7RVbkNEd0kxq
>>> FhFK4Scc2HBrKMUrPTzO1nGCgR1gA015C2MFfmjeW49VTi95WnY8DDG2euUYAtpl
>>> lLVuNxJxq7eDJfP/M9kxSIKgrQ==
>>> -----END CERTIFICATE-----
>>> subject=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
>>> issuer=/C=Unknown/O=bluebox/OU=bluebox/CN=Cloudstack User
>>> ---
>>> No client certificate CA names sent
>>> Server Temp Key: ECDH, P-256, 256 bits
>>> ---
>>> SSL handshake has read 1325 bytes and written 331 bytes
>>> ---
>>> New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA
>>> Server public key is 2048 bit
>>> Secure Renegotiation IS supported
>>> Compression: NONE
>>> Expansion: NONE
>>> No ALPN negotiated
>>> SSL-Session:
>>> Protocol : TLSv1
>>> Cipher : ECDHE-RSA-AES256-SHA
>>> Session-ID:
>>> 57C08E5D841D9B74AB533A0000BDBE4275424640B5B7D1985D438B275598436F
>>> Session-ID-ctx:
>>> Master-Key:
>>> E1DE7D7464766DCECD82088561A7D928519CA20BA91FDF4AA38BB2E51AB7C27D3F1ABA5788ADED2AD49FE8597EC4A344
>>> Key-Arg : None
>>> PSK identity: None
>>> PSK identity hint: None
>>> SRP username: None
>>> Start Time: 1472237149
>>> Timeout : 7200 (sec)
>>> Verify return code: 18 (self signed certificate)
>>> ---
>>>
>>>
>>>
>>>
>>> Regards.
>>>
>>> ________________________________
>>> From: martin kolly <[email protected]>
>>> Sent: 26 August 2016 20:36:02
>>> To: [email protected]
>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601
>>>
>>>
>>> good point, thanks Simon! with openssl we receive a response on port 8250.
>>>
>>> # telnet 10.100.12.10 8250
>>> Trying 10.100.12.10...
>>> Connected to 10.100.12.10.
>>> Escape character is '^]'.
>>> Connection closed by foreign host.
>>>
>>> # nc -zv 10.100.12.10 8250
>>> Connection to 10.100.12.10 8250 port [tcp/*] succeeded!
>>>
>>> # openssl s_client -tls1 -connect 10.100.12.10:8250
>>> CONNECTED(00000003)
>>> write:errno=104
>>> ---
>>> no peer certificate available
>>> ---
>>> No client certificate CA names sent
>>> ---
>>> SSL handshake has read 0 bytes and written 0 bytes
>>> ---
>>> New, (NONE), Cipher is (NONE)
>>> Secure Renegotiation IS NOT supported
>>> Compression: NONE
>>> Expansion: NONE
>>> SSL-Session:
>>> Protocol : TLSv1
>>> Cipher : 0000
>>> Session-ID:
>>> Session-ID-ctx:
>>> Master-Key:
>>> Key-Arg : None
>>> PSK identity: None
>>> PSK identity hint: None
>>> SRP username: None
>>> Start Time: 1472223447
>>> Timeout : 7200 (sec)
>>> Verify return code: 0 (ok)
>>> ---
>>>
>>>
>>> On 08/26/2016 04:49 PM, Simon Weller wrote:
>>>> Martin,
>>>>
>>>>
>>>> Are you able to actually telnet to 8250 from the host to the mgmt server?
>>>>
>>>>
>>>> - Si
>>>>
>>>>
>>>> ________________________________
>>>> From: martin kolly <[email protected]>
>>>> Sent: Friday, August 26, 2016 9:41 AM
>>>> To: [email protected]
>>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601
>>>>
>>>> Hi Rohit
>>>>
>>>> We highly appreciate your efforts! Unfortunately it still does not work.
>>>> - ulimit is increased on mgmt server
>>>> - jar file replaced
>>>> - we confirm that cloudstack-agent 4.9.0 is installed
>>>>
>>>> MGMT Server
>>>> # wget
>>>> https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar
>>>> -O cloud-utils-4.9.0.jar.patch
>>>> # md5sum cloud-utils-4.9.0.jar.patch
>>>> c4496f42cc6741f562ac645c3a3d8a0c cloud-utils-4.9.0.jar.patch
>>>> # md5sum
>>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar
>>>> c4496f42cc6741f562ac645c3a3d8a0c
>>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar
>>>>
>>>> # ulimit -a
>>>> core file size (blocks, -c) 0
>>>> data seg size (kbytes, -d) unlimited
>>>> scheduling priority (-e) 0
>>>> file size (blocks, -f) unlimited
>>>> pending signals (-i) 64109
>>>> max locked memory (kbytes, -l) 64
>>>> max memory size (kbytes, -m) unlimited
>>>> open files (-n) 10240
>>>> pipe size (512 bytes, -p) 8
>>>> POSIX message queues (bytes, -q) 819200
>>>> real-time priority (-r) 0
>>>> stack size (kbytes, -s) 8192
>>>> cpu time (seconds, -t) unlimited
>>>> max user processes (-u) 64109
>>>> virtual memory (kbytes, -v) unlimited
>>>> file locks (-x) unlimited
>>>>
>>>> KVM Server
>>>> # wget
>>>> https://github.com/rhtyd/cloudstack/releases/download/4.9.0-nioinbound/cloud-utils-4.9.0.jar
>>>> -O cloud-utils-4.9.0.jar.patch
>>>> # md5sum cloud-utils-4.9.0.jar.patch
>>>> c4496f42cc6741f562ac645c3a3d8a0c cloud-utils-4.9.0.jar.patch
>>>> # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
>>>> c4496f42cc6741f562ac645c3a3d8a0c
>>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
>>>>
>>>> # apt-cache policy cloudstack-agent
>>>> cloudstack-agent:
>>>> Installed: 4.9.0
>>>> Candidate: 4.9.0
>>>> Version table:
>>>> *** 4.9.0 0
>>>> 500 http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/
>>>> ./ Packages
>>>> 100 /var/lib/dpkg/status
>>>>
>>>> The logs are attached. By the way: the error message "Caught the Exception
>>>> in VmIpFetchTask" was already there with 4.8 release.
>>>>
>>>> Thanks
>>>> Martin
>>>>
>>>> On 08/25/2016 07:44 PM, Rohit Yadav wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>>
>>>> Thanks for sharing. Alright, I'm not sure what's causing issue but based
>>>> on the logs seems like only KVM agents are having issues while connecting
>>>> to mgmt server as I don't see any Nio related exceptions in the management
>>>> server logs.
>>>>
>>>>
>>>> I could not see the cloudstack-agent version in the logs, I'm assuming
>>>> that they were all upgraded to 4.9.0, and there are no conflicting jars at
>>>> /usr/share/cloudstack-agent/lib.
>>>>
>>>>
>>>> First, can you make sure mgmt server has enough ulimit. I found that
>>>> Ubuntu/Debian's init.d script don't override this while CentOS
>>>> initd/systemd script sets ulimit. On your mgmt server, edit
>>>> /etc/init.d/cloudstack-management and add ulimit -n 10240 just before the
>>>> mgmt server is started in the 'state' section (for me it was at around
>>>> line #147 where it logs a message that it's starting the
>>>> cloudstack-management server).
>>>>
>>>>
>>>> Next, if this still does not solve the issue -- I created a special
>>>> cloud-utils.jar for you that you need to place on your mgmt server and on
>>>> the KVM agents and restart the mgmt server. This will increase verbosity
>>>> of the error while reduce the Nio polling loop timeout (from 100ms to
>>>> 10ms). On KVM agents, the error from the logs is that during SSL handshake
>>>> inbound connection/stream gets closed, and we want to know the exception
>>>> message. Please get the jar from here:
>>>>
>>>> https://github.com/rhtyd/cloudstack/releases/tag/4.9.0-nioinbound and
>>>> place them at:
>>>>
>>>> /usr/share/cloudstack-agent/lib/ (on kvm host)
>>>>
>>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/ (on mgmt
>>>> server host)
>>>>
>>>>
>>>> Let me know what worked for you, and if it still failed can you share the
>>>> mgmt server and agent logs once again. Thanks.
>>>>
>>>>
>>>> Regards.
>>>>
>>>> ________________________________
>>>> From: martin kolly
>>>> <[email protected]><mailto:[email protected]>
>>>> Sent: 25 August 2016 20:50:08
>>>> To: [email protected]<mailto:[email protected]>
>>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601
>>>>
>>>> Hi Rohit
>>>>
>>>> We are running java version 1.7.0.111 on KVM and management server.
>>>> mgmt# java -version
>>>> java version "1.7.0_111"
>>>> kvm# java -version
>>>> java version "1.7.0_111"
>>>>
>>>> We get the same error message. Attached are the logs with TRACE enabled.
>>>>
>>>> "success consists of going from failure to failure without loss of
>>>> enthusiasm."
>>>>
>>>> regards
>>>> martin
>>>>
>>>> On 08/25/2016 02:02 PM, Rohit Yadav wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>>
>>>> Thanks for sharing, on the surface there does not seem to be any issue in
>>>> configuration causing the failures. I'm personally running KVM and Ubuntu
>>>> hosts based env without issues, I'm on Ubuntu 14.04.4 (Linux bluebox
>>>> 3.16.0-45-generic #60~14.04.1-Ubuntu) and java 1.7.0_79. Can you try
>>>> upgrading your JRE7 to latest (openjdk-7-jre,
>>>> 7u111-2.6.7-0ubuntu0.14.04.3) on all mgmt server and kvm hosts?
>>>>
>>>>
>>>> If upgrading your JRE does not help, can you increase the logging
>>>> verbosity for both the agent and management server (in
>>>> /etc/cloudstack/{agent, management} there would be a log4j file, edit that
>>>> and replace DEBUG/INFO with TRACE for class/keys com.cloud and
>>>> org.apache.cloudstack) and re-share logs when the failures occur? I want
>>>> to see what additional information we can get from logs when it tries to
>>>> connect to host 10.100.12.10 on port: 8250.
>>>>
>>>>
>>>> Regards.
>>>>
>>>> ________________________________
>>>> From: martin kolly
>>>> <[email protected]><mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> Sent: 25 August 2016 17:11:06
>>>> To:
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601
>>>>
>>>>
>>>> @Simon: We have one management server with local DB. KVMs connect
>>>> directly to the management server without any security/loadbalancing
>>>> device.
>>>>
>>>> Thanks
>>>> Martin
>>>>
>>>> On 08/25/2016 12:41 PM, Simon Weller wrote:
>>>>
>>>>
>>>> Martin,
>>>>
>>>> Can you provide more detail about your haproxy setup?
>>>> Are you running it on separate servers, or on the management server itself?
>>>>
>>>> - Si
>>>>
>>>> Simon Weller/ENA
>>>> (615) 312-6068
>>>>
>>>> [email protected]<mailto:[email protected]>
>>>> www.shapeblue.com<http://www.shapeblue.com>
>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK
>>>> @shapeblue
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: martin kolly
>>>> [[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>]
>>>> Received: Thursday, 25 Aug 2016, 5:04AM
>>>> To: Rohit Yadav
>>>> [[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>];
>>>>
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>>
>>>> [[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>]
>>>> Subject: Re: CS 4.9 NIO Selector wait time PR-1601
>>>>
>>>>
>>>> thanks for your reply.
>>>>
>>>> This morning we repeated the upgrade process from 4.8 to 4.9 with the
>>>> following repository:
>>>> http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/.
>>>> <http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/><http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/>
>>>> Unfortunately we run into the same issue:
>>>>
>>>> /2016-08-25 09:49:00,660 INFO [utils.nio.NioClient] (main:null)
>>>> (logid:) Connecting to 10.100.12.10:8250//
>>>> //2016-08-25 09:49:00,668 WARN [utils.nio.Link] (main:null) (logid:)
>>>> This SSL engine was forced to close inbound due to end of stream.//
>>>> //2016-08-25 09:49:00,668 ERROR [utils.nio.NioClient] (main:null)
>>>> (logid:) SSL Handshake failed while connecting to host: 10.100.12.10
>>>> port: 8250//
>>>> //2016-08-25 09:49:00,668 ERROR [utils.nio.NioConnection] (main:null)
>>>> (logid:) Unable to initialize the threads.//
>>>> //java.io.IOException: SSL Handshake failed while connecting to host:
>>>> 10.100.12.10 port: 8250//
>>>> // at com.cloud.utils.nio.NioClient.init(NioClient.java:67)//
>>>> // at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)//
>>>> // at com.cloud.agent.Agent.start(Agent.java:237)//
>>>> // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)//
>>>> // at
>>>> com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)//
>>>> // at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)//
>>>> // at com.cloud.agent.AgentShell.start(AgentShell.java:456)//
>>>> // at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)//
>>>> // at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)//
>>>> // at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)//
>>>> // at java.lang.reflect.Method.invoke(Method.java:606)//
>>>> // at
>>>> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)//
>>>> //2016-08-25 09:49:00,669 INFO [utils.exception.CSExceptionErrorCode]
>>>> (main:null) (logid:) Could not find exception:
>>>> com.cloud.utils.exception.NioConnectionException in error code list for
>>>> exceptions//
>>>> //2016-08-25 09:49:00,669 WARN [cloud.agent.Agent] (main:null) (logid:)
>>>> NIO Connection Exception
>>>> com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
>>>> while connecting to host: 10.100.12.10 port: 8250//
>>>> //2016-08-25 09:49:00,670 INFO [cloud.agent.Agent] (main:null) (logid:)
>>>> Attempted to connect to the server, but received an unexpected
>>>> exception, trying again.../
>>>>
>>>> *KVM Hosts:
>>>> */# java -version
>>>> java version "1.7.0_95"
>>>> OpenJDK Runtime Environment (IcedTea 2.6.4) (7u95-2.6.4-0ubuntu0.14.04.1)
>>>> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
>>>>
>>>> # dpkg --get-selections | grep -e 'jdk' -e 'java'
>>>> ca-certificates-java install
>>>> java-common install
>>>> libcommons-daemon-java install
>>>> openjdk-7-jre-headless:amd64 install
>>>> tzdata-java install
>>>>
>>>> # apt-cache policy cloudstack-agent
>>>> cloudstack-agent:
>>>> Installed: 4.9.0
>>>> Candidate: 4.9.0
>>>> Version table:
>>>> *** 4.9.0 0
>>>> 500
>>>> http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages
>>>> 100 /var/lib/dpkg/status
>>>>
>>>> # find /usr/share/ -name "cloud-utils*.jar"
>>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
>>>> # md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
>>>> a8de7306d7c80b5a73e93b83afdd119f
>>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar
>>>>
>>>>
>>>> /*Management Server:
>>>> */# java -version//
>>>> //java version "1.7.0_95"//
>>>> //OpenJDK Runtime Environment (IcedTea 2.6.4)
>>>> (7u95-2.6.4-0ubuntu0.14.04.1)//
>>>> //OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)//
>>>> //
>>>> //# dpkg --get-selections | grep -e 'jdk' -e 'java'//
>>>> //ca-certificates-java install//
>>>> //java-common install//
>>>> //libcommons-collections3-java install//
>>>> //libcommons-daemon-java install//
>>>> //libcommons-dbcp-java install//
>>>> //libcommons-pool-java install//
>>>> //libecj-java install//
>>>> //libgeronimo-jta-1.1-spec-java install//
>>>> //libmysql-java install//
>>>> //libservlet2.5-java install//
>>>> //libtomcat6-java install//
>>>> //openjdk-7-jre-headless:amd64 install//
>>>> //tzdata-java install//
>>>> //
>>>> //# apt-cache policy cloudstack-management//
>>>> //cloudstack-management://
>>>> // Installed: 4.9.0//
>>>> // Candidate: 4.9.0//
>>>> // Version table://
>>>> // *** 4.9.0 0//
>>>> // 500
>>>> http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/ ./ Packages//
>>>> // 100 /var/lib/dpkg/status///
>>>>
>>>> /# find /usr/share/ -name "cloud-utils*.jar"//
>>>> ///usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
>>>> ///usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
>>>> ///usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar//
>>>> //# md5sum
>>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
>>>> //a8de7306d7c80b5a73e93b83afdd119f
>>>> /usr/share/cloudstack-management/webapps/client/WEB-INF/lib/cloud-utils-4.9.0.jar//
>>>> //# md5sum /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
>>>> //a8de7306d7c80b5a73e93b83afdd119f
>>>> /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar//
>>>> //# md5sum /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar//
>>>> //a8de7306d7c80b5a73e93b83afdd119f
>>>> /usr/share/cloudstack-usage/lib/cloud-utils-4.9.0.jar/
>>>>
>>>> The classpath.conf was not modified:
>>>> /# cat /etc/cloudstack/management/classpath.conf
>>>> #!/bin/bash
>>>> #...
>>>>
>>>> SYSTEMJARS=""
>>>> SCP=$(build-classpath $SYSTEMJARS 2>/dev/null) ; if [ $? != 0 ] ; then
>>>> export SCP="" ; fi
>>>> MCP=""
>>>> DCP="/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar"
>>>> CLASSPATH=$SCP:$DCP:$MCP:/etc/cloudstack/management:/usr/share/cloudstack-management/setup
>>>> for jarfile in ""/* ; do
>>>> if [ ! -e "$jarfile" ] ; then continue ; fi
>>>> CLASSPATH=$jarfile:$CLASSPATH
>>>> done
>>>> for plugin in ""/* ; do
>>>> if [ ! -e "$plugin" ] ; then continue ; fi
>>>> CLASSPATH=$plugin:$CLASSPATH
>>>> done
>>>> for vendorconf in "/etc/cloudstack/management"/vendor/* ; do
>>>> if [ ! -d "$vendorconf" ] ; then continue ; fi
>>>> CLASSPATH=$vendorconf:$CLASSPATH
>>>> done
>>>> export CLASSPATH
>>>> if ([ -z "$JAVA_HOME" ] || [ ! -d "$JAVA_HOME" ]) && [ -d
>>>> /usr/lib/jvm/jre-1.7.0 ]; then
>>>> export JAVA_HOME=/usr/lib/jvm/jre-1.7.0
>>>> fi
>>>> PATH=$JAVA_HOME/bin:/sbin:/usr/sbin:$PATH
>>>> export PATH/
>>>>
>>>> Regards
>>>> Martin
>>>>
>>>> On 08/24/2016 06:56 PM, Rohit Yadav wrote:
>>>>
>>>>
>>>> Martin,
>>>>
>>>>
>>>> Were you able to fix your issue after installing packages from the
>>>> repo Will shared and restarting the services?
>>>>
>>>> I've not personally tested the apt-get.eu repo, but I had earlier
>>>> built this repo which I'm personally using in my local KVM-trusty
>>>> based cloud: http://packages.shapeblue.com/cloudstack/upstream/debian/4.9/
>>>>
>>>>
>>>> If you're still getting the error, can you share the JRE version
>>>> you're running, both on the mgmt server and on the KVM hosts? You can
>>>> run java -version, or share output of "dpkg --get-selections | grep -e
>>>> 'jdk' -e 'java'". Are you running CloudStack with any additional plugins?
>>>>
>>>> From the logs, looks like there are mixed jar files,
>>>> NioConnectionException class was not found -- something's wrong with
>>>> your installation. there must be a cloud-utils jar file make sure your
>>>> installation don't have multiple copies/versions of jars
>>>> (somewhere) in the in /usr/share/cloudstack-common and in
>>>> /usr/share/cloudstack-management/webapps/client/ paths:
>>>>
>>>> Could not find exception:
>>>> com.cloud.utils.exception.NioConnectionException in error code list for
>>>> exceptions
>>>> The error "Unable to initialize the threads." suggests, JVM was not
>>>> able to spawn threads. I would like to know your JRE version and any
>>>> other settings configured in /etc/cloudstack/management/classpath.conf
>>>> (and there are bunch of other files where JAVA_OPTS might have been
>>>> overridden). Note: For now you should only be using JRE1.7.
>>>>
>>>>
>>>> Regards.
>>>>
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
>>>> @shapeblue
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* martin kolly
>>>> <[email protected]><mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> *Sent:* 24 August 2016 19:53:26
>>>> *To:*
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>;
>>>> Rohit Yadav
>>>> *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601
>>>>
>>>> Thanks Will!
>>>>
>>>> yes the repo is pointing to 4.9 release for all KVMs and for the
>>>> management server:
>>>> /cloudstack:~# cat /etc/apt/sources.list.d/cloudstack.list //
>>>> //deb http://cloudstack.apt-get.eu/ubuntu trusty 4.9/
>>>>
>>>> All KVM agents and the mgmt server are upgraded to release 4.9 based
>>>> on the documentation.We have restarted all the cloudstack-agents and
>>>> the cloudstack-management service as well.
>>>>
>>>> Network traces are showing packets from KVM <-> Mgmt on port 8250.
>>>> there is no security device in between.
>>>>
>>>> thanks
>>>> fanfarlo
>>>>
>>>>
>>>>
>>>>
>>>> On 08/24/2016 04:13 PM, Will Stevens wrote:
>>>>
>>>>
>>>> @rohit, I am guessing they should be installing the cloudstack-agent using
>>>> the following repo right? That is what is described in the upgrade (trusty
>>>> instead of precise though).
>>>>
>>>> http://cloudstack.apt-get.eu/ubuntu/dists/trusty/4.9/
>>>>
>>>> @fanfarlo, are your repo's setup to point to the new 4.9 version?
>>>>
>>>> cheers,
>>>>
>>>> will
>>>>
>>>> On Wed, Aug 24, 2016 at 9:46 AM, Rohit Yadav
>>>> <[email protected]><mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> The PR and fix already exists in 4.9.0 release. Please make sure to
>>>> upgrade all of your management server(s) and KVM agents and then also
>>>> restart them after the upgrade.
>>>>
>>>>
>>>> If you are seeing SSL handshake failures, it could be due to network or
>>>> security issue and most likely due to mismatch between CloudStack mgmt
>>>> server and KVM agent version.
>>>>
>>>>
>>>> Regards.
>>>>
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
>>>> @shapeblue
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Will Stevens
>>>> <[email protected]><mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> *Sent:* 24 August 2016 18:17:17
>>>> *To:*
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>;
>>>> Rohit Yadav
>>>> *Subject:* Re: CS 4.9 NIO Selector wait time PR-1601
>>>>
>>>>
>>>> That PR is already merged, so you don't have to do anything to get that
>>>> code, you already have it.
>>>>
>>>> @rohit, can you review this? I think this is a similar to the issue Simon
>>>> reported earlier.
>>>>
>>>> Will
>>>>
>>>> On Aug 24, 2016 6:56 AM, "fanfarlo"
>>>> <[email protected]><mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> hi all
>>>>
>>>> We have the following environment:
>>>> - OS: Debian 14.04 (hypervisors and management)
>>>> - 4 KVM Hosts
>>>> - Cloudstack Release 4.9 with local database
>>>>
>>>> Since we upgraded to Release 4.9 the KVM hosts no longer connect to the
>>>> management Server. Upgrade procedure was followed as described:
>>>> http://docs.cloudstack.apache.org/projects/cloudstack-releas
>>>> e-notes/en/4.9.0/upgrade/upgrade-4.8.html
>>>>
>>>>
>>>> On the KVM hosts we have the following error message:
>>>> /2016-08-24 10:42:49,678 INFO [utils.exception.CSExceptionErrorCode]
>>>> (main:null) (logid:) Could not find exception:
>>>> com.cloud.utils.exception.NioConnectionException in error code list for
>>>> exceptions
>>>> 2016-08-24 10:42:49,678 WARN [cloud.agent.Agent] (main:null) (logid:)
>>>> NIO Connection Exception
>>>> com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
>>>> while connecting to host: 10.100.12.10 port: 8250
>>>> 2016-08-24 10:42:49,678 INFO [cloud.agent.Agent] (main:null) (logid:)
>>>> Attempted to connect to the server, but received an unexpected
>>>> exception, trying again...
>>>> 2016-08-24 10:42:54,679 INFO [utils.nio.NioClient] (main:null) (logid:)
>>>> Connecting to 10.100.12.10:8250
>>>> 2016-08-24 10:42:54,684 WARN [utils.nio.Link] (main:null) (logid:) This
>>>> SSL engine was forced to close inbound due to end of stream.
>>>> 2016-08-24 10:42:54,684 ERROR [utils.nio.NioClient] (main:null) (logid:)
>>>> SSL Handshake failed while connecting to host: 10.100.12.10 port: 8250
>>>> 2016-08-24 10:42:54,685 ERROR [utils.nio.NioConnection] (main:null)
>>>> (logid:) Unable to initialize the threads.
>>>> java.io.IOException: SSL Handshake failed while connecting to host:
>>>> 10.100.12.10 port: 8250
>>>> at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
>>>> at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)
>>>> at com.cloud.agent.Agent.start(Agent.java:237)
>>>> at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)
>>>> at
>>>> com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)
>>>> at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)
>>>> at com.cloud.agent.AgentShell.start(AgentShell.java:456)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>>> ssorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>>> thodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at
>>>> org.apache.commons.daemon.support.DaemonLoader.start(DaemonL
>>>> oader.java:243)
>>>> 2016-08-24 10:42:54,685 INFO [utils.exception.CSExceptionErrorCode]
>>>> (main:null) (logid:) Could not find exception:
>>>> com.cloud.utils.exception.NioConnectionException in error code list for
>>>> exceptions
>>>> 2016-08-24 10:42:54,685 WARN [cloud.agent.Agent] (main:null) (logid:)
>>>> NIO Connection Exception
>>>> com.cloud.utils.exception.NioConnectionException: SSL Handshake failed
>>>> while connecting to host: 10.100.12.10 port: 8250
>>>> 2016-08-24 10:42:54,686 INFO [cloud.agent.Agent] (main:null) (logid:)
>>>> Attempted to connect to the server, but received an unexpected
>>>> exception, trying again.../
>>>>
>>>>
>>>> Port is open on the management server, there is no firewall in between.
>>>> We found that there was a bug report here:
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-9348. There is a PR
>>>> changing the NIO Selector wait time:
>>>> https://github.com/apache/cloudstack/pull/1601 which was merged into
>>>> master branch.
>>>>
>>>> Since we installed Release 4.9 we probably need to patch the
>>>> NioConection.class as described in PR1601 , right?
>>>>
>>>> kvm03# unzip -v /usr/share/cloudstack-agent/lib/cloud-utils-4.9.0.jar |
>>>> grep NioConnection
>>>> 3923 Defl:N 1778 55% 2016-08-02 09:28 05aaf7d5
>>>> com/cloud/utils/nio/NioConnection$1.class
>>>> 881 Defl:N 495 44% 2016-08-02 09:28 e378984c
>>>> com/cloud/utils/nio/NioConnection$ChangeRequest.class
>>>> 15410 Defl:N 7130 54% 2016-08-02 09:28 b3281f5a
>>>> com/cloud/utils/nio/NioConnection.class
>>>> 1134 Defl:N 584 49% 2016-08-02 09:28 8d5cb4a8
>>>> com/cloud/utils/exception/NioConnectionException.class
>>>>
>>>> Due to a lack of java expertise we have some basic questions:
>>>> - Is there a patched jar file available ? public build server?
>>>> - Do we need to create the jar from sources ? procedure?
>>>> - How do we apply the patch ?
>>>>
>>>> many thanks!
>>>> fanfarlo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>
>>>> www.shapeblue.com<http://www.shapeblue.com><http://www.shapeblue.com><http://www.shapeblue.com>
>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK
>>>> @shapeblue
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>> [email protected]
>>> www.shapeblue.com<http://www.shapeblue.com>
>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK
>>> @shapeblue
>>>
>>>
>>>
>>>
>
> [email protected]
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London WC2N 4HSUK
> @shapeblue
>
>
>
>