[ https://issues.apache.org/jira/browse/CASSANDRA-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845638#comment-13845638 ]
Tyler Hobbs commented on CASSANDRA-4288: ---------------------------------------- bq. Quentin Conner cited 30-40 samples as being statistically significant, although I'm not sure where that number comes from either. That's only for a statistically significant measurement of variance for FD purposes. I terms of gossip, we're relying on other nodes already having made accurate FD judgements, so what Chris has here should be sufficient. bq. I used 5s as the min wait instead of 15s. I don't really care if that's a constant or fraction or RING_DELAY or whatever color people like best. A constant is fine, but just to fit within the project coding style, these should probably be {{static final int GOSSIP_SETTLE_MIN_WAIT_MS}}, etc. Minor comments on the code: * The warning log should only happen after {{gossipSettlePollSuccessRequired}} rounds of checking. * You can use variadic arguments for the warning log instead of an array * totalPolls should be incremented prior to the warning log * You can just throw a RuntimeException after catching the InterruptedException Other than that, I think it's good. > prevent thrift server from starting before gossip has settled > ------------------------------------------------------------- > > Key: CASSANDRA-4288 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4288 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Peter Schuller > Assignee: Chris Burroughs > Fix For: 2.0.4 > > Attachments: CASSANDRA-4288-trunk.txt, j4288-1.2-v1-txt > > > A serious problem is that there is no co-ordination whatsoever between gossip > and the consumers of gossip. In particular, on a large cluster with hundreds > of nodes, it takes several seconds for gossip to settle because the gossip > stage is CPU bound. This leads to a node starting up and accessing thrift > traffic long before it has any clue of what up and down. This leads to > client-visible timeouts (for nodes that are down but not identified as such) > and UnavailableException (for nodes that are up but not yet identified as > such). This is really bad in general, but in particular for clients doing > non-idempotent writes (counter increments). > I was going to fix this as part of more significant re-writing in other > tickets having to do with gossip/topology/etc, but that's not going to > happen. So, the attached patch is roughly what we're running with in > production now to make restarts bearable. The minimum wait time is both for > ensuring that gossip has time to start becoming CPU bound if it will be, and > the reason it's large is to allow for down nodes to be identified as such in > most typical cases with a default phi conviction threshold (untested, we > actually ran with a smaller number of 5 seconds minimum, but from past > experience I believe 15 seconds is enough). > The patch is tested on our 1.1 branch. It applies on trunk, and the diff is > against trunk, but I have not tested it against trunk. -- This message was sent by Atlassian JIRA (v6.1.4#6159)