Aled Sage created BROOKLYN-174:
----------------------------------

             Summary: Cassandra process restart(): timed out, and left as 
state=starting
                 Key: BROOKLYN-174
                 URL: https://issues.apache.org/jira/browse/BROOKLYN-174
             Project: Brooklyn
          Issue Type: Bug
    Affects Versions: 0.9.0-SNAPSHOT
            Reporter: Aled Sage


I had an app with a single Cassandra node deployed to a private vcloud-director 
cloud (so accessing Cassandra over NAT). I called the stop effector on the 
cassandra entity to just stop its process. I then called restart().

It took about 6 minutes for cassandra’s thrift port latency to get a 
non-negative value (i.e. for the poll to succeed). This meant the post-restart 
failed. When eventually it did successfully poll, the entity was left in a 
“starting” state, with serviceUp=true.

Several things to investigate/fix:

* When polling for the thrift port, are we sensibly timing out on each 
individually poll attempt (rather than the first attempt taking many minutes to 
fail)?

* Should we increase the “start.timeout” default, which is 2 minutes?  
  (gut feel is that 2 minutes should be plenty!)

* Why did it stay as “starting”? Do we need to wrap the restart in a try-catch 
or some such, so that it goes on-fire?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to