[
https://issues.apache.org/jira/browse/CLOUDSTACK-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290884#comment-15290884
]
ASF GitHub Bot commented on CLOUDSTACK-9348:
--------------------------------------------
Github user rhtyd commented on the pull request:
https://github.com/apache/cloudstack/pull/1549#issuecomment-220289585
@swill I was able to reproduce the addHost issue and fixed it. I was able
to deploy against a KVM env with two clusters with Ubuntu and CentOS based
hosts in each: (addHost can fail silently if we're trying to mix CentOS and
Ubuntu KVM hosts in the same cluster)

If possible, I would request for this bugfix to be added in 4.9.0 release
as it fixes:
- critical nio reconnection, improves SSL handling by making it non-blocking
- without this fix, agents (kvm, systemvm etc) can take some time (minutes
to hours) to reconnect
- the management server's agent handler on port 8250 can be pwned by a
single telnet (or similar malicious) client. To test, on existing deployments
without this patch just connect a telnet agent on port 8250 of the mgmt server,
now this mgmt server can no longer immediately handle any agent connection (try
restarting kvm agent for example).
> CloudStack Server degrades when a lot of connections on port 8250
> -----------------------------------------------------------------
>
> Key: CLOUDSTACK-9348
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9348
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Reporter: Rohit Yadav
> Assignee: Rohit Yadav
> Fix For: 4.9.0
>
>
> An intermittent issue was found with a large CloudStack deployment, where
> servers could not keep agents connected on port 8250.
> All connections are handled by accept() in NioConnection:
> https://github.com/apache/cloudstack/blob/master/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L125
> A new connection is handled by accept() which does blocking SSL handshake. A
> good fix would be to make this non-blocking and handle expensive tasks in
> separate threads/pool. This way the main IO loop won't be blocked and can
> continue to serve other agents/clients.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)