[ https://issues.apache.org/jira/browse/CASSANDRA-19644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847459#comment-17847459 ]
Jon Haddad commented on CASSANDRA-19644: ---------------------------------------- Ah. I didn't see CASSANDRA-16364. My preferred solution is different than what's in there, I'll drop my comment on that one and close this out. > deterministic token allocation combined with slow gossip propogation can lead > to data loss > ------------------------------------------------------------------------------------------ > > Key: CASSANDRA-19644 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19644 > Project: Cassandra > Issue Type: Bug > Reporter: Jon Haddad > Priority: Normal > > I've seen several cases now where starting nodes within a somewhat short time > window (about a minute) when using the default allocation tokens for RF leads > to token conflicts. Unfortunately this can easily go undetected with medium > to large clusters. > When this happens, different nodes in the cluster will have different > understandings of the topology of the cluster. I've seen this go unnoticed > in a production environment for several months, leading to data loss, data > resurrection, and other odd behavior. > We should apply some randomness to the tokens to ensure that even in the case > of 1 nodes starting at once, it's still unlikely that they will ever have a > conflict. Applying a random() value to the token value between - 2^8 and 2^8 > makes this statistically very, very unlikely that we'll ever have a collision > while also preserving the balance of token distribution in the ring. In the > case of 2 nodes starting at the same time, the operator will have weird token > distribution instead of data loss. > > {noformat} > INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - > Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token > -1938510198161598815. /10.0.2.134:7000 is the new owner > INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - > Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token > -3478858378222500629. /10.0.2.134:7000 is the new owner > INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - > Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token > 3562748272064835315. /10.0.2.134:7000 is the new owner > INFO [GossipStage:1] 2024-05-17 22:16:12,333 StorageService.java:3006 - > Nodes /10.0.2.134:7000 and cassandra1/10.0.1.61:7000 have the same token > 8085185010613503278. /10.0.2.134:7000 is the new owner{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org