[ 
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401090#comment-17401090
 ] 

Roman commented on CASSANDRA-16364:
-----------------------------------

I have hit the issue running 4.0 (inside k8s, therefore starting 4 instances in 
parallel)

 

It seems that `auto_bootstrap: true` is not a default, as one of the comments 
suggests. In my case, without the option, one machine has eventually (after 10 
restarts) joined the cluster; but I also observed a situation when a cluster 
was up for a day and one of the machines has restarted hundreds of times 
(always with a token conflict)

 

With the `auto_boostrap: true` 4 instances are starting in parallel; and two of 
them restart 1-2 times (due to a bootsrap conflict – but that seems to a 
separate issue from the one above). 

 

This was the error before `auto_bootsrap:true`

```
{{INFO [main] 2021-08-18 02:38:29,032 NetworkTopologyStrategy.java:88 - 
Configured datacenter replicas are datacenter1:rf(2)}}{{INFO [main] 2021-08-18 
02:38:29,034 TokenAllocatorFactory.java:44 - Using 
ReplicationAwareTokenAllocator.}}{{INFO [main] 2021-08-18 02:38:29,122 
TokenAllocation.java:106 - Selected tokens [-869047834665074658, 
6571578339392131746, -5974523007943185192, -3644355145115701774, 
3287046338630430582, -2401348872989035546, 1849708238101167874, 
-4749797269495265510]}}{{INFO [main] 2021-08-18 02:38:29,129 
StorageService.java:1619 - JOINING: sleeping 30000 ms for pending range 
setup}}{{INFO [main] 2021-08-18 02:38:59,130 StorageService.java:1619 - 
JOINING: Starting to bootstrap...}}{{INFO [main] 2021-08-18 02:38:59,147 
RangeStreamer.java:330 - Bootstrap: range 
Full(/10.96.70.81:7000,(5801172110722970579,6571578339392131746]) exists on 
Full(/10.96.44.142:7000,(5801172110722970579,7341984568061292914]) for keyspace 
system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - 
Bootstrap: range 
Full(/10.96.70.81:7000,(-4092359140985418682,-3644355145115701774]) exists on 
Full(/10.96.59.211:7000,(-4092359140985418682,-3196351149245984865]) for 
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 
RangeStreamer.java:330 - Bootstrap: range 
Full(/10.96.70.81:7000,(-3196351149245984865,-2401348872989035546]) exists on 
Full(/10.96.59.211:7000,(-3196351149245984865,-1606346596732086227]) for 
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 
RangeStreamer.java:330 - Bootstrap: range 
Full(/10.96.70.81:7000,(990822151481071145,1849708238101167874]) exists on 
Full(/10.96.44.142:7000,(990822151481071145,2708594324721264603]) for keyspace 
system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - 
Bootstrap: range 
Full(/10.96.70.81:7000,(-1606346596732086227,-869047834665074658]) exists on 
Full(/10.96.44.142:7000,(-1606346596732086227,-131749072598063088]) for 
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 
RangeStreamer.java:330 - Bootstrap: range 
Full(/10.96.70.81:7000,(-6541810617881258046,-5974523007943185192]) exists on 
Full(/10.96.59.211:7000,(-6541810617881258046,-5407235398005112337]) for 
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 
RangeStreamer.java:330 - Bootstrap: range 
Full(/10.96.70.81:7000,(2708594324721264603,3287046338630430582]) exists on 
Full(/10.96.59.211:7000,(2708594324721264603,3865498352539596562]) for keyspace 
system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - 
Bootstrap: range 
Full(/10.96.70.81:7000,(-5407235398005112337,-4749797269495265510]) exists on 
Full(/10.96.44.142:7000,(-5407235398005112337,-4092359140985418682]) for 
keyspace system_auth}}{{java.lang.IllegalStateException: Multiple strict 
sources found for 
Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources: 
[Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), 
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at 
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at
 
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at
 org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at 
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at 
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at
 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at
 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at
 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at
 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at
 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at
 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at
 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at
 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{Exception
 (java.lang.IllegalStateException) encountered during startup: Multiple strict 
sources found for 
Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources: 
[Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), 
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{ERROR 
[main] 2021-08-18 02:38:59,153 CassandraDaemon.java:909 - Exception encountered 
during startup}}{{java.lang.IllegalStateException: Multiple strict sources 
found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), 
sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), 
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at 
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at
 
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at
 org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at 
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at 
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at
 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at
 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at
 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at
 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at
 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at
 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at
 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at
 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{INFO
 [StorageServiceShutdownHook] 2021-08-18 02:38:59,224 HintsService.java:220 - 
Paused hints dispatch}}
```
(after which, the cassandra pod will restart)

> Joining nodes simultaneously with auto_bootstrap:false can cause token 
> collision
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Paulo Motta
>            Priority: Normal
>             Fix For: 4.0.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same 
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both 
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, 
> and the workaround to fix this is to avoid parallel bootstrap when using 
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and 
> prevent this situation when possible, since it can break users relying on 
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without 
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break 
> via node-id)
> 4. broadcast tokens and move on with bootstrap



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to