Aled Sage created BROOKLYN-174:
----------------------------------
Summary: Cassandra process restart(): timed out, and left as
state=starting
Key: BROOKLYN-174
URL: https://issues.apache.org/jira/browse/BROOKLYN-174
Project: Brooklyn
Issue Type: Bug
Affects Versions: 0.9.0-SNAPSHOT
Reporter: Aled Sage
I had an app with a single Cassandra node deployed to a private vcloud-director
cloud (so accessing Cassandra over NAT). I called the stop effector on the
cassandra entity to just stop its process. I then called restart().
It took about 6 minutes for cassandra’s thrift port latency to get a
non-negative value (i.e. for the poll to succeed). This meant the post-restart
failed. When eventually it did successfully poll, the entity was left in a
“starting” state, with serviceUp=true.
Several things to investigate/fix:
* When polling for the thrift port, are we sensibly timing out on each
individually poll attempt (rather than the first attempt taking many minutes to
fail)?
* Should we increase the “start.timeout” default, which is 2 minutes?
(gut feel is that 2 minutes should be plenty!)
* Why did it stay as “starting”? Do we need to wrap the restart in a try-catch
or some such, so that it goes on-fire?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)