GitHub user borisroman opened a pull request:
https://github.com/apache/cloudstack/pull/863
[BLOCKER][4.6]CLOUDSTACK-8883: Resolved connect/reconnect issue.
Hi!
@wilderrodrigues by implementing Callable you switched a couple of methods
and fields. I switched them some more!
The reason why the Agent wouldn't reconnect was due to two facts.
Problem 1: Selector was blocking.
In the while loop at [1] _selector.select(); was blocking when the
connection was lost. This means at [2] _isStartup = false; was never excecuted.
Therefore at [3] the call to isStartup() always returned true resulting in an
infinite loop.
Resolution 1: Move the call to cleanUp() [4] before checking if isStartup()
has turned to false. cleanUp() will close() the _selector resulting in
_isStartup to be set to false.
Problem 2: Setting _isStartup & _isRunning to true when init() throwed an
unchecked exception (ConnectException).
The exception was nicely caught, but only logged. No action was taken!
Resulting in _isStartup & _isRunning being set to true. Resulting in the fact
the Agent thought it was connected successfully, though it wasn't.
Resolution 2: Adding return to the catch statement [5]. This way _isStartup
& _isRunning aren't set to true.
Steps to test:
1. Deploy ACS.
2. Try all combinations of stopping/starting managment server/agent.
[1]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L128
[2]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L176
[3]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/agent/src/com/cloud/agent/Agent.java#L404
[4]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/agent/src/com/cloud/agent/Agent.java#L399
[5]https://github.com/borisroman/cloudstack/blob/b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e/utils/src/main/java/com/cloud/utils/nio/NioConnection.java#L91
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/borisroman/cloudstack CLOUDSTACK-8883
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/cloudstack/pull/863.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #863
----
commit 9693b97c2147b3fdb9579a1ebb33597cd3bf1d11
Author: Boris Schrijver <[email protected]>
Date: 2015-09-21T14:54:56Z
Call cleanUp() before looping isStartup().
commit b34f86c8d55a1cfc057585eab4db0fa2d98a7b3e
Author: Boris Schrijver <[email protected]>
Date: 2015-09-21T22:38:16Z
Added return statement to stop start() if there has been an
ConnectException.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---