[ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847572#comment-17847572
 ] 

Jon Haddad commented on CASSANDRA-16364:
----------------------------------------

bq. Seems the fault to fix here is preventing/detecting this problem as early 
as possible (and better docs) per the original description of the ticket. 100% 
agree that the feature can and should be made safer. Changing the design to 
non-deterministic may work but is hacky, inappropriate in patch versions and 
i'm sure will introduce breakages (/more work) elsewhere given our assumptions 
on the design.

I agree that detecting and preventing as early as possible is preferable.

The primary drawback of using jitter here is that nodes will bootstrap with 
very slim owned ranges.  That's fine for reducing contention in locks but here 
t would result in significant ring imbalance, defeating the purpose of the 
token allocation algo.  Maybe that's what you mean by "hacky"?  

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Paulo Motta
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to