[jira] [Comment Edited] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false

Tyler Hobbs (JIRA) Fri, 04 Apr 2014 14:52:30 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960455#comment-13960455
 ]


Tyler Hobbs edited comment on CASSANDRA-6961 at 4/4/14 9:49 PM:
----------------------------------------------------------------

I'm seeing some issues with repair while one node is running with 
join_ring=false.

Here's what I did:
* Start a three node ccm cluster
* Start a stress write with RF=3
* Stop node3
* Start node3 with join_ring=false
* Run a repair against node3

It looks like the repair finishes everything diffing and streaming, but the 
repair command hangs, and netstats shows continuously increasing completed 
Command/Response counts.


was (Author: thobbs):
I'm seeing some issues with repair while one node is running with 
join_ring=false.

Here's what I did:
* Start a three node ccm cluster
* Start a stress write with RF=3
* Stop node3
* Start node3
* Run a repair against node3

It looks like the repair finishes everything diffing and streaming, but the 
repair command hangs, and netstats shows continuously increasing completed 
Command/Response counts.

> nodes should go into hibernate when join_ring is false
> ------------------------------------------------------
>
>                 Key: CASSANDRA-6961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 2.0.7
>
>         Attachments: 6961.txt
>
>
> The impetus here is this: a node that was down for some period and comes back 
> can serve stale information.  We know from CASSANDRA-768 that we can't just 
> wait for hints, and know that tangentially related CASSANDRA-3569 prevents us 
> from having the node in a down (from the FD's POV) state handle streaming.
> We can *almost* set join_ring to false, then repair, and then join the ring 
> to narrow the window (actually, you can do this and everything succeeds 
> because the node doesn't know it's a member yet, which is probably a bit of a 
> bug.)  If instead we modified this to put the node in hibernate, like 
> replace_address does, it could work almost like replace, except you could run 
> a repair (manually) while in the hibernate state, and then flip to normal 
> when it's done.
> This won't prevent the staleness 100%, but it will greatly reduce the chance 
> if the node has been down a significant amount of time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6961) nodes should go into hibernate when join_ring is false

Reply via email to