[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764467#comment-17764467 ]
Cameron Zemek edited comment on CASSANDRA-18845 at 9/13/23 3:32 AM: -------------------------------------------------------------------- I have attached patched. Tested this as follows: # Spin up single node cluster. Works due to epSize == liveSize check that lets it bypass the liveSize > 1 check # Spin up 3 node cluster. All 3 nodes start up NTR as expected. # Shutdown all nodes. Start up first node it stays waiting in gossip due to the liveSize > 1 requirement # Start up second node. Now both nodes start NTR since liveSize > 1 and there are no other incoming `is now UP` events so gossip looks settled. NOTE: I had to disable the if condition for call to Gossiper.waitToSettle() since was using loopback addresses was (Author: cam1982): I have attached patched. Tested this as follows: # Spin up single node cluster. Works due to epSize == liveSize check that lets it bypass the liveSize > 1 check # Spin up 3 node cluster. All 3 nodes start up NTR as expected. # Shutdown all nodes. Start up first node it stays waiting in gossip due to the liveSize > 1 requirement # Start up second node. Now both nodes start NTR since liveSize > 1 and there are no other incoming `is now UP` events so gossip looks settled. > Waiting for gossip to settle on live endpoints > ---------------------------------------------- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement > Reporter: Cameron Zemek > Priority: Normal > Attachments: 18845-3.11.patch > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org