[ https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401090#comment-17401090 ]
Roman commented on CASSANDRA-16364: ----------------------------------- I have hit the issue running 4.0 (inside k8s, therefore starting 4 instances in parallel) It seems that `auto_bootstrap: true` is not a default, as one of the comments suggests. In my case, without the option, one machine has eventually (after 10 restarts) joined the cluster; but I also observed a situation when a cluster was up for a day and one of the machines has restarted hundreds of times (always with a token conflict) With the `auto_boostrap: true` 4 instances are starting in parallel; and two of them restart 1-2 times (due to a bootsrap conflict – but that seems to a separate issue from the one above). This was the error before `auto_bootsrap:true` ``` {{INFO [main] 2021-08-18 02:38:29,032 NetworkTopologyStrategy.java:88 - Configured datacenter replicas are datacenter1:rf(2)}}{{INFO [main] 2021-08-18 02:38:29,034 TokenAllocatorFactory.java:44 - Using ReplicationAwareTokenAllocator.}}{{INFO [main] 2021-08-18 02:38:29,122 TokenAllocation.java:106 - Selected tokens [-869047834665074658, 6571578339392131746, -5974523007943185192, -3644355145115701774, 3287046338630430582, -2401348872989035546, 1849708238101167874, -4749797269495265510]}}{{INFO [main] 2021-08-18 02:38:29,129 StorageService.java:1619 - JOINING: sleeping 30000 ms for pending range setup}}{{INFO [main] 2021-08-18 02:38:59,130 StorageService.java:1619 - JOINING: Starting to bootstrap...}}{{INFO [main] 2021-08-18 02:38:59,147 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(5801172110722970579,6571578339392131746]) exists on Full(/10.96.44.142:7000,(5801172110722970579,7341984568061292914]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(-4092359140985418682,-3644355145115701774]) exists on Full(/10.96.59.211:7000,(-4092359140985418682,-3196351149245984865]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(-3196351149245984865,-2401348872989035546]) exists on Full(/10.96.59.211:7000,(-3196351149245984865,-1606346596732086227]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(990822151481071145,1849708238101167874]) exists on Full(/10.96.44.142:7000,(990822151481071145,2708594324721264603]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(-1606346596732086227,-869047834665074658]) exists on Full(/10.96.44.142:7000,(-1606346596732086227,-131749072598063088]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(-6541810617881258046,-5974523007943185192]) exists on Full(/10.96.59.211:7000,(-6541810617881258046,-5407235398005112337]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(2708594324721264603,3287046338630430582]) exists on Full(/10.96.59.211:7000,(2708594324721264603,3865498352539596562]) for keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 - Bootstrap: range Full(/10.96.70.81:7000,(-5407235398005112337,-4749797269495265510]) exists on Full(/10.96.44.142:7000,(-5407235398005112337,-4092359140985418682]) for keyspace system_auth}}{{java.lang.IllegalStateException: Multiple strict sources found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{Exception (java.lang.IllegalStateException) encountered during startup: Multiple strict sources found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{ERROR [main] 2021-08-18 02:38:59,153 CassandraDaemon.java:909 - Exception encountered during startup}}{{java.lang.IllegalStateException: Multiple strict sources found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]), Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{INFO [StorageServiceShutdownHook] 2021-08-18 02:38:59,224 HintsService.java:220 - Paused hints dispatch}} ``` (after which, the cassandra pod will restart) > Joining nodes simultaneously with auto_bootstrap:false can cause token > collision > -------------------------------------------------------------------------------- > > Key: CASSANDRA-16364 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16364 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership > Reporter: Paulo Motta > Priority: Normal > Fix For: 4.0.x > > > While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same > tokens using the default {{allocate_tokens_for_local_rf}}. However they both > succeeded bootstrap with colliding tokens. > We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079, > and the workaround to fix this is to avoid parallel bootstrap when using > {{allocate_tokens_for_local_rf}}. > However, since this is the default behavior, we should try to detect and > prevent this situation when possible, since it can break users relying on > parallel bootstrap behavior. > I think we could prevent this as following: > 1. announce intent to bootstrap via gossip (ie. add node on gossip without > token information) > 2. wait for gossip to settle for a longer period (ie. ring delay) > 3. allocate tokens (if multiple bootstrap attempts are detected, tie break > via node-id) > 4. broadcast tokens and move on with bootstrap -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org