[ 
https://issues.apache.org/jira/browse/CASSANDRA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244880#comment-14244880
 ] 

Brandon Williams edited comment on CASSANDRA-8336 at 12/12/14 10:18 PM:
------------------------------------------------------------------------

Perhaps one thing we could do is put the node into hibernation before the 
shutdown message.  This way, it will never get marked alive regardless of the 
heartbeat, even if it propagates later.  We might want a new dead state for 
that though, since I don't want to overload the hibernation state with too many 
functions since that will complicate knowing what state a node is really in.


was (Author: brandon.williams):
Perhaps one thing we could do is put the node into hibernation before the 
shutdown message.  This way, it will never get marked alive regardless of the 
heartbeat, even if it propagates later.  We might want a new dead state for 
that though, since I don't want to overload the hibernation state with too many 
functions since that will complicate known what state a node is really in.

> Quarantine nodes after receiving the gossip shutdown message
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-8336
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8336
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.12
>
>
> In CASSANDRA-3936 we added a gossip shutdown announcement.  The problem here 
> is that this isn't sufficient; you can still get TOEs and have to wait on the 
> FD to figure things out.  This happens due to gossip propagation time and 
> variance; if node X shuts down and sends the message to Y, but Z has a 
> greater gossip version than Y for X and has not yet received the message, it 
> can initiate gossip with Y and thus mark X alive again.  I propose 
> quarantining to solve this, however I feel it should be a -D parameter you 
> have to specify, so as not to destroy current dev and test practices, since 
> this will mean a node that shuts down will not be able to restart until the 
> quarantine expires.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to