[ https://issues.apache.org/jira/browse/CASSANDRA-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204025#comment-13204025 ]
Peter Schuller commented on CASSANDRA-3834: ------------------------------------------- Something I'd like to include in this ticket is to make the behavior on start-up a bit different. Part of making the locator abstract will be moving things like waiting for RING_DELAY into the locator (not directly in e.g. StorageService). But I'd like to also add support for not listening to thrift until one has deemed oneself to be "up to speed" with the cluster state. It's a long standing issue that on start-up, a node just starts listening without any kind of delay or synchronization with gossip (I am talking normal start-up here, not bootstraps). This can can a flurry of {{Unavailable}} exceptions when a node starts up in a cluster. Because the impact is only {{Unavailable}}, as opposed to actual data integrity issues (since if everything works correctly we'll never route requests to nodes for which we don't have valid gossip information), we need not wait for RING_DELAY. But we can have a simple heuristic that minimizes the chance of this being a problem. For example (not super-throught through, but a stab): * Wait for *at least* one successful gossip round to a node *or* some amount of seconds (say 10?). * Then, whatever ring state we are aware of, wait for *at least* a couple of seconds. * Maybe wait an additional period up to say 10 seconds if less than a certain percentage of nodes in the cluster is up. An alternative is to do something maximally simple like "always just wait 10 seconds", but that could grow annoying in the common cases where 0-2 seconds is actually enough. > make locator pluggable/abstract > ------------------------------- > > Key: CASSANDRA-3834 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3834 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Peter Schuller > Assignee: Peter Schuller > > Make the locator pluggable, such that we can use something other than > gissip to discover ring topology. > This is in part for CASSANDRA-3833, but is also useful because we want > something simpler and more reliable than gossip on production > clusters. At minimum we must be able to hit our nodes with a clue bat > in emergencies ("no this node is NOT in state Normal damn it"), even if > we were to normally rely on gossip. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira