[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Jonathan Ellis (JIRA) Tue, 20 Sep 2011 04:59:37 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108617#comment-13108617
 ]


Jonathan Ellis commented on CASSANDRA-2434:
-------------------------------------------

bq. It's always been unsupported to bootstrap a second node into the same 
"token arc" while a previous one is ongoing.

I'm pretty sure now that this is incorrect; we fixed it back in CASSANDRA-603.  
I'm updating the comments in TokenMetadata as follows:

{noformat}
    // Prior to CASSANDRA-603, we just had <tt>Map<Range, InetAddress> 
pendingRanges<tt>,
    // which was added to when a node began bootstrap and removed from when it 
finished.
    //
    // This is inadequate when multiple changes are allowed simultaneously.  
For example,
    // suppose that there is a ring of nodes A, C and E, with replication 
factor 3.
    // Node D bootstraps between C and E, so its pending ranges will be E-A, 
A-C and C-D.
    // Now suppose node B bootstraps between A and C at the same time. Its 
pending ranges
    // would be C-E, E-A and A-B. Now both nodes need to be assigned pending 
range E-A,
    // which we would be unable to represent with the old Map.  The same thing 
happens
    // even more obviously for any nodes that boot simultaneously between same 
two nodes.
    //
    // So, we made two changes:
    //
    // First, we changed pendingRanges to a <tt>Multimap<Range, 
InetAddress></tt> (now
    // <tt>Map<String, Multimap<Range, InetAddress>></tt>, because replication 
strategy
    // and options are per-KeySpace).
    //
    // Second, we added the bootstrapTokens and leavingEndpoints collections, 
so we can
    // rebuild pendingRanges from the complete information of what is going on, 
when
    // additional changes are made mid-operation.
    //
    // Finally, note that recording the tokens of joining nodes in 
bootstrapTokens also
    // means we can detect and reject the addition of multiple nodes at the 
same token
    // before one becomes part of the ring.
    private BiMap<Token, InetAddress> bootstrapTokens = HashBiMap.create();
    // (don't need to record Token here since it's still part of 
tokenToEndpointMap until it's done leaving)
    private Set<InetAddress> leavingEndpoints = new HashSet<InetAddress>();
    // this is a cache of the calculation from {tokenToEndpointMap, 
bootstrapTokens, leavingEndpoints}
    private ConcurrentMap<String, Multimap<Range, InetAddress>> pendingRanges = 
new ConcurrentHashMap<String, Multimap<Range, InetAddress>>();
{noformat}


> range movements can violate consistency
> ---------------------------------------
>
>                 Key: CASSANDRA-2434
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2434
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: paul cannon
>             Fix For: 1.0.1
>
>         Attachments: 2434-3.patch.txt, 2434-testery.patch.txt
>
>
> My reading (a while ago) of the code indicates that there is no logic 
> involved during bootstrapping that avoids consistency level violations. If I 
> recall correctly it just grabs neighbors that are currently up.
> There are at least two issues I have with this behavior:
> * If I have a cluster where I have applications relying on QUORUM with RF=3, 
> and bootstrapping complete based on only one node, I have just violated the 
> supposedly guaranteed consistency semantics of the cluster.
> * Nodes can flap up and down at any time, so even if a human takes care to 
> look at which nodes are up and things about it carefully before 
> bootstrapping, there's no guarantee.
> A complication is that not only does it depend on use-case where this is an 
> issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster 
> which is otherwise used for QUORUM operations you may wish to accept 
> less-than-quorum nodes during bootstrap in various emergency situations.
> A potential easy fix is to have bootstrap take an argument which is the 
> number of hosts to bootstrap from, or to assume QUORUM if none is given.
> (A related concern is bootstrapping across data centers. You may *want* to 
> bootstrap to a local node and then do a repair to avoid sending loads of data 
> across DC:s while still achieving consistency. Or even if you don't care 
> about the consistency issues, I don't think there is currently a way to 
> bootstrap from local nodes only.)
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2434) range movements can violate consistency

Reply via email to