[ 
https://issues.apache.org/jira/browse/CASSANDRA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204025#comment-13204025
 ] 

Peter Schuller commented on CASSANDRA-3834:
-------------------------------------------

Something I'd like to include in this ticket is to make the behavior on 
start-up a bit different. Part of making the locator abstract will be moving 
things like waiting for RING_DELAY into the locator (not directly in e.g. 
StorageService). But I'd like to also add support for not listening to thrift 
until one has deemed oneself to be "up to speed" with the cluster state. It's a 
long standing issue that on start-up, a node just starts listening without any 
kind of delay or synchronization with gossip (I am talking normal start-up 
here, not bootstraps). This can can a flurry of {{Unavailable}} exceptions when 
a node starts up in a cluster.

Because the impact is only {{Unavailable}}, as opposed to actual data integrity 
issues (since if everything works correctly we'll never route requests to nodes 
for which we don't have valid gossip information), we need not wait for 
RING_DELAY. But we can have a simple heuristic that minimizes the chance of 
this being a problem. For example (not super-throught through, but a stab):

* Wait for *at least* one successful gossip round to a node *or* some amount of 
seconds (say 10?).
* Then, whatever ring state we are aware of, wait for *at least* a couple of 
seconds.
* Maybe wait an additional period up to say 10 seconds if less than a certain 
percentage of nodes in the cluster is up.

An alternative is to do something maximally simple like "always just wait 10 
seconds", but that could grow annoying in the common cases where 0-2 seconds is 
actually enough.

                
> make locator pluggable/abstract
> -------------------------------
>
>                 Key: CASSANDRA-3834
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3834
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> Make the locator pluggable, such that we can use something other than
> gissip to discover ring topology.
> This is in part for CASSANDRA-3833, but is also useful because we want
> something simpler and more reliable than gossip on production
> clusters. At minimum we must be able to hit our nodes with a clue bat
> in emergencies ("no this node is NOT in state Normal damn it"), even if
> we were to normally rely on gossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to