[ 
https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481559#comment-14481559
 ] 

Brandon Williams commented on CASSANDRA-8336:
---------------------------------------------

There is one last wrinkle with this: if a bootstrap is started but then aborted 
by the operator, the shutdown message makes it part of the ring in that it will 
be persisted to system.peers, which then confuses clients.  I believe the same 
will happen with an aborted replace_address as well, or any non-normal state 
which gets aborted and then sends the shutdown state.  One solution might be to 
have Gossiper's stop() examine its own state and compare against dead states 
and the joining state to decide whether to send the shutdown state or not.

> Quarantine nodes after receiving the gossip shutdown message
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8336
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.15
>
>         Attachments: 8336-v2.txt, 8336-v3.txt, 8336-v4.txt, 8336.txt
>
>
> In CASSANDRA-3936 we added a gossip shutdown announcement.  The problem here 
> is that this isn't sufficient; you can still get TOEs and have to wait on the 
> FD to figure things out.  This happens due to gossip propagation time and 
> variance; if node X shuts down and sends the message to Y, but Z has a 
> greater gossip version than Y for X and has not yet received the message, it 
> can initiate gossip with Y and thus mark X alive again.  I propose 
> quarantining to solve this, however I feel it should be a -D parameter you 
> have to specify, so as not to destroy current dev and test practices, since 
> this will mean a node that shuts down will not be able to restart until the 
> quarantine expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to