[ https://issues.apache.org/jira/browse/CASSANDRA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108617#comment-13108617 ]
Jonathan Ellis commented on CASSANDRA-2434: ------------------------------------------- bq. It's always been unsupported to bootstrap a second node into the same "token arc" while a previous one is ongoing. I'm pretty sure now that this is incorrect; we fixed it back in CASSANDRA-603. I'm updating the comments in TokenMetadata as follows: {noformat} // Prior to CASSANDRA-603, we just had <tt>Map<Range, InetAddress> pendingRanges<tt>, // which was added to when a node began bootstrap and removed from when it finished. // // This is inadequate when multiple changes are allowed simultaneously. For example, // suppose that there is a ring of nodes A, C and E, with replication factor 3. // Node D bootstraps between C and E, so its pending ranges will be E-A, A-C and C-D. // Now suppose node B bootstraps between A and C at the same time. Its pending ranges // would be C-E, E-A and A-B. Now both nodes need to be assigned pending range E-A, // which we would be unable to represent with the old Map. The same thing happens // even more obviously for any nodes that boot simultaneously between same two nodes. // // So, we made two changes: // // First, we changed pendingRanges to a <tt>Multimap<Range, InetAddress></tt> (now // <tt>Map<String, Multimap<Range, InetAddress>></tt>, because replication strategy // and options are per-KeySpace). // // Second, we added the bootstrapTokens and leavingEndpoints collections, so we can // rebuild pendingRanges from the complete information of what is going on, when // additional changes are made mid-operation. // // Finally, note that recording the tokens of joining nodes in bootstrapTokens also // means we can detect and reject the addition of multiple nodes at the same token // before one becomes part of the ring. private BiMap<Token, InetAddress> bootstrapTokens = HashBiMap.create(); // (don't need to record Token here since it's still part of tokenToEndpointMap until it's done leaving) private Set<InetAddress> leavingEndpoints = new HashSet<InetAddress>(); // this is a cache of the calculation from {tokenToEndpointMap, bootstrapTokens, leavingEndpoints} private ConcurrentMap<String, Multimap<Range, InetAddress>> pendingRanges = new ConcurrentHashMap<String, Multimap<Range, InetAddress>>(); {noformat} > range movements can violate consistency > --------------------------------------- > > Key: CASSANDRA-2434 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2434 > Project: Cassandra > Issue Type: Bug > Reporter: Peter Schuller > Assignee: paul cannon > Fix For: 1.0.1 > > Attachments: 2434-3.patch.txt, 2434-testery.patch.txt > > > My reading (a while ago) of the code indicates that there is no logic > involved during bootstrapping that avoids consistency level violations. If I > recall correctly it just grabs neighbors that are currently up. > There are at least two issues I have with this behavior: > * If I have a cluster where I have applications relying on QUORUM with RF=3, > and bootstrapping complete based on only one node, I have just violated the > supposedly guaranteed consistency semantics of the cluster. > * Nodes can flap up and down at any time, so even if a human takes care to > look at which nodes are up and things about it carefully before > bootstrapping, there's no guarantee. > A complication is that not only does it depend on use-case where this is an > issue (if all you ever do you do at CL.ONE, it's fine); even in a cluster > which is otherwise used for QUORUM operations you may wish to accept > less-than-quorum nodes during bootstrap in various emergency situations. > A potential easy fix is to have bootstrap take an argument which is the > number of hosts to bootstrap from, or to assume QUORUM if none is given. > (A related concern is bootstrapping across data centers. You may *want* to > bootstrap to a local node and then do a repair to avoid sending loads of data > across DC:s while still achieving consistency. Or even if you don't care > about the consistency issues, I don't think there is currently a way to > bootstrap from local nodes only.) > Thoughts? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira