[ https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847494#comment-17847494 ]
Michael Semb Wever commented on CASSANDRA-16364: ------------------------------------------------ Backing up [~jjordan]'s statement, token allocation is designed to be deterministic, and we don't support simultaneous bootstraps. Seems the fault to fix here is preventing/detecting this problem as early as possible (and better docs) per the original description of the ticket. 100% agree that the feature can and should be made safer. Changing the design to non-deterministic may work but is hacky, inappropriate in patch versions and i'm sure will introduce breakages (/more work) elsewhere given our assumptions on the design. Does this apply in trunk with tcm? Think we should be removing fixVersion 5.x > Joining nodes simultaneously with auto_bootstrap:false can cause token > collision > -------------------------------------------------------------------------------- > > Key: CASSANDRA-16364 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16364 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership > Reporter: Paulo Motta > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same > tokens using the default {{allocate_tokens_for_local_rf}}. However they both > succeeded bootstrap with colliding tokens. > We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, > and the workaround to fix this is to avoid parallel bootstrap when using > {{allocate_tokens_for_local_rf}}. > However, since this is the default behavior, we should try to detect and > prevent this situation when possible, since it can break users relying on > parallel bootstrap behavior. > I think we could prevent this as following: > 1. announce intent to bootstrap via gossip (ie. add node on gossip without > token information) > 2. wait for gossip to settle for a longer period (ie. ring delay) > 3. allocate tokens (if multiple bootstrap attempts are detected, tie break > via node-id) > 4. broadcast tokens and move on with bootstrap -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org