[jira] Commented: (CASSANDRA-1108) ability to forcibly mark machines failed

Brandon Williams (JIRA) Mon, 27 Dec 2010 12:35:41 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975332#action_12975332
 ]


Brandon Williams commented on CASSANDRA-1108:
---------------------------------------------

I see two approaches here: the more complex one, where we add a new flag to 
gossip saying this node is blacklisted and propagate it around, and maintain a 
map of blacklisted nodes that we check in many places, or one where we just 
tell the bad node to stop gossiping allowing the FD of the other nodes to mark 
it down.  The downside to the latter approach is that if the bad node is so 
sick (ie, GC death spiral) you can't complete the JMX call, you're still stuck, 
but hopefully at that point it's in such bad shape gossip doesn't work either.

> ability to forcibly mark machines failed
> ----------------------------------------
>
>                 Key: CASSANDRA-1108
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1108
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 0.7.1
>
>
> For when a node is failing but not yet so badly that it can't participate in 
> gossip (e.g. hard disk failing but not dead yet) we should give operators the 
> power to forcibly mark a node as dead.
> I think we'd need to add an extra flag in gossip to say "this deadness is 
> operator-imposed" or the next heartbeat will flip it back to live.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1108) ability to forcibly mark machines failed

Reply via email to