[jira] [Commented] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision

Jon Haddad (Jira) Mon, 08 Apr 2024 12:16:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835024#comment-17835024
 ]


Jon Haddad commented on CASSANDRA-16364:
----------------------------------------

Just ran into this with the latest 4.0, starting 9 nodes simultaneously, all 
marked as seeds.  Using this many seeds was an oversight, but I thought we 
randomized the tokens in a way that would prevent this from happening.   
{noformat}
INFO  [GossipStage:1] 2024-04-08 18:51:42,228 StorageService.java:2851 - Nodes 
/172.31.30.76:7000 and /172.31.32.145:7000 have the same token 
-4365585967229483808.  Ignoring /172.31.30.76:7000
INFO  [GossipStage:1] 2024-04-08 18:51:42,228 StorageService.java:2851 - Nodes 
/172.31.30.76:7000 and /172.31.32.145:7000 have the same token 
156850771319184154.  Ignoring /172.31.30.76:7000
INFO  [GossipStage:1] 2024-04-08 18:51:42,228 StorageService.java:2851 - Nodes 
/172.31.30.76:7000 and /172.31.32.145:7000 have the same token 
7039551456192731860.  Ignoring /172.31.30.76:7000
INFO  [GossipStage:1] 2024-04-08 18:51:42,229 StorageService.java:2851 - Nodes 
/172.31.30.76:7000 and /172.31.32.145:7000 have the same token 
8579899636253633675.  Ignoring /172.31.30.76:7000{noformat}

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Paulo Motta
>            Priority: Normal
>             Fix For: 4.0.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16364) Joining nodes simultaneously with auto_bootstrap:false can cause token collision

Reply via email to