[ https://issues.apache.org/jira/browse/CASSANDRA-13348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944078#comment-15944078 ]
Tom van der Woerdt edited comment on CASSANDRA-13348 at 3/27/17 9:32 PM: ------------------------------------------------------------------------- Murmur3 partitioner, yes. GossipingPropertyFileSnitch, if it matters. I don't recall exactly how this cluster was built, but it was something like this : * Provision 5 nodes per DC, all but one with "-Dcassandra.join_ring=false". Keyspace with rf= dc1:2 dc2:2 * "nodetool join" one at a time (random order) * Provision 30 nodes in dc1 -- all have "allocate_tokens_for_keyspace" set * "nodetool join" ~10 * Decommission the first five, so we're now left with dc1:10 * "nodetool join" the rest * Ditto for dc2, so we now have dc1:30 dc2:30 There's a lot of automation involved, a human may take a different route to doing this. I decommissioned the 10 initial nodes which had non-ideal hardware, and they made place for 60 more powerful machines. The "nodetool join" batches to join the final 20 in the DC caused the bad tokens. was (Author: tvdw): Murmur3 simulator, yes. GossipingPropertyFileSnitch, if it matters. I don't recall exactly how this cluster was built, but it was something like this : * Provision 5 nodes per DC, all but one with "-Dcassandra.join_ring=false". Keyspace with rf= dc1:2 dc2:2 * "nodetool join" one at a time (random order) * Provision 30 nodes in dc1 -- all have "allocate_tokens_for_keyspace" set * "nodetool join" ~10 * Decommission the first five, so we're now left with dc1:10 * "nodetool join" the rest * Ditto for dc2, so we now have dc1:30 dc2:30 There's a lot of automation involved, a human may take a different route to doing this. I decommissioned the 10 initial nodes which had non-ideal hardware, and they made place for 60 more powerful machines. The "nodetool join" batches to join the final 20 in the DC caused the bad tokens. > Duplicate tokens after bootstrap > -------------------------------- > > Key: CASSANDRA-13348 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13348 > Project: Cassandra > Issue Type: Bug > Reporter: Tom van der Woerdt > Priority: Blocker > Fix For: 3.0.x > > > This one is a bit scary, and probably results in data loss. After a bootstrap > of a few new nodes into an existing cluster, two new nodes have chosen some > overlapping tokens. > In fact, of the 256 tokens chosen, 51 tokens were already in use on the other > node. > Node 1 log : > {noformat} > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 > StorageService.java:1160 - JOINING: waiting for ring information > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 > StorageService.java:1160 - JOINING: waiting for schema information to complete > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,461 > StorageService.java:1160 - JOINING: schema complete, ready to bootstrap > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 > StorageService.java:1160 - JOINING: waiting for pending range calculation > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 > StorageService.java:1160 - JOINING: calculation complete, ready to bootstrap > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,462 > StorageService.java:1160 - JOINING: getting bootstrap token > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,564 > TokenAllocation.java:61 - Selected tokens [............, 2959334889475814712, > 3727103702384420083, 7183119311535804926, 6013900799616279548, > -1222135324851761575, 1645259890258332163, -1213352346686661387, > 7604192574911909354] > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 > TokenAllocation.java:65 - Replicated node load in datacentre before > allocation max 1.00 min 1.00 stddev 0.0000 > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 > TokenAllocation.java:66 - Replicated node load in datacentre after allocation > max 1.00 min 1.00 stddev 0.0000 > WARN [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:43,729 > TokenAllocation.java:70 - Unexpected growth in standard deviation after > allocation. > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:42:44,150 > StorageService.java:1160 - JOINING: sleeping 30000 ms for pending range setup > INFO [RMI TCP Connection(107)-127.0.0.1] 2017-03-09 07:43:14,151 > StorageService.java:1160 - JOINING: Starting to bootstrap... > {noformat} > Node 2 log: > {noformat} > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:51,937 > StorageService.java:971 - Joining ring by operator request > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: waiting for ring information > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: waiting for schema information to complete > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: schema complete, ready to bootstrap > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,513 > StorageService.java:1160 - JOINING: waiting for pending range calculation > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 > StorageService.java:1160 - JOINING: calculation complete, ready to bootstrap > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,514 > StorageService.java:1160 - JOINING: getting bootstrap token > WARN [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,630 > TokenAllocation.java:61 - Selected tokens [......, 2890709530010722764, > -2416006722819773829, -5820248611267569511, -5990139574852472056, > 1645259890258332163, 9135021011763659240, -5451286144622276797, > 7604192574911909354] > WARN [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,794 > TokenAllocation.java:65 - Replicated node load in datacentre before > allocation max 1.02 min 0.98 stddev 0.0000 > WARN [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:52,795 > TokenAllocation.java:66 - Replicated node load in datacentre after allocation > max 1.00 min 1.00 stddev 0.0000 > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:55:53,149 > StorageService.java:1160 - JOINING: sleeping 30000 ms for pending range setup > INFO [RMI TCP Connection(380)-127.0.0.1] 2017-03-17 15:56:23,149 > StorageService.java:1160 - JOINING: Starting to bootstrap... > {noformat} > eg. 7604192574911909354 has been chosen by both. > The joins were eight days apart, so I don't think it's a race :) -- This message was sent by Atlassian JIRA (v6.3.15#6346)