[ 
https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109230#comment-13109230
 ] 

Nick Bailey commented on CASSANDRA-2434:
----------------------------------------

Ok, so I think there are really two consistency issues here. Firstly, picking 
the 'right' node to stream data to/from when making changes to the ring. 
Secondly, disallowing concurrent changes that have overlapping ranges.

Currently we only disallow nodes from moving/decommissioning when they may 
potentially have data being streamed to them. There are a few examples of 
things we currently allow which I think are generally a bad idea.

1) Say you have nodes A and D, if you bootstrap nodes B and C at the same time 
in between A and D, it may turn out that the correct node to stream from for 
both nodes is D. Now say node C finishes bootstrapping before node B. At that 
point, the correct node for B to bootstrap from is technically C, although D 
still has the data. However, since D is no longer technically responsible for 
the data, the user could run cleanup on D and delete the data that B is 
attempting to stream.

2) The above case is also a problem when you bootstrap a node and the node it 
decides it needs to stream from is moving. Once that node finishes moving you 
could run cleanup on that node and delete data that the bootstrapping node 
needs. In this case, all documentation indicates you should do a cleanup after 
a move in order to remove old data, so it seems possibly more likely.

3) A variation of the above case is when you bootstrap a node and the node it 
streams from is leaving. In that case the decom may finish and the user could 
terminate the cassandra process and/or node breaking any streams. Not to 
mention the idea of a node in a decommissioned state continuing to stream seems 
like a bad idea. I believe it would work currently, but I'm not sure and it 
seems likely to break.

I can't really think of any other examples but I think thats enough to 
illustrate that overlapping concurrent ring changes are a bad idea and we 
should just attempt to prevent them in all cases. An argument could be made 
that this would prevent you from doubling your cluster (the best way to grow) 
all at once, but I don't think that's really a huge deal. At most you would 
need RF steps to double your cluster.

> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic 
> involved during bootstrapping that avoids consistency level violations. If I 
> recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, 
> and bootstrapping complete based on only one node, I have just violated the 
> supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to 
> look at which nodes are up and things about it carefully before 
> bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an 
> issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster 
> which is otherwise used for QUORUM operations you may wish to accept 
> less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the 
> number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to 
> bootstrap to a local node and then do a repair to avoid sending loads of data 
> across DC:s while still achieving consistency. Or even if you don't care 
> about the consistency issues, I don't think there is currently a way to 
> bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to