[ https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339022#comment-14339022 ]
Brandon Williams commented on CASSANDRA-8336: --------------------------------------------- bq. If hit 'Unable to gossip with any seeds’ on replace, it shuts down the gossiper. Do you have the stacktrace where this is happening? I have a feeling we're going to end up in checked exception hell trying to fix this since we throw RuntimeException there (to avoid such a hell, in fact.) > Quarantine nodes after receiving the gossip shutdown message > ------------------------------------------------------------ > > Key: CASSANDRA-8336 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8336 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Brandon Williams > Assignee: Brandon Williams > Fix For: 2.0.13 > > Attachments: 8336-v2.txt, 8336-v3.txt, 8336.txt > > > In CASSANDRA-3936 we added a gossip shutdown announcement. The problem here > is that this isn't sufficient; you can still get TOEs and have to wait on the > FD to figure things out. This happens due to gossip propagation time and > variance; if node X shuts down and sends the message to Y, but Z has a > greater gossip version than Y for X and has not yet received the message, it > can initiate gossip with Y and thus mark X alive again. I propose > quarantining to solve this, however I feel it should be a -D parameter you > have to specify, so as not to destroy current dev and test practices, since > this will mean a node that shuts down will not be able to restart until the > quarantine expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)