[ 
https://issues.apache.org/jira/browse/CASSANDRA-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated CASSANDRA-18636:
-------------------------------
    Description: 
When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is an 
ERROR log message in system.log after drop keyspace command is executed. 
{code:java}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:48,575 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=2}}, tables=[], views=[], functions=[], types=[]}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:53,106 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
INFO  [Native-Transport-Requests-2] 2023-06-26 06:11:56,673 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=50.0ms, caching={'keys' : 'ALL', 
'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
[condcduzca o ukwhjwbdljqnz 
vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx,
 condcduzca, vzkzguyh, o, 
ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 - 
Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
INFO  [Native-Transport-Requests-9] 2023-06-26 06:12:04,634 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=ALWAYS, caching={'keys' : 'ALL', 
'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
[dzwqnsv gveohmecfdwx o 
ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv,
 vzkzguyh, gveohmecfdwx, o, 
ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 - 
Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o
INFO  [Native-Transport-Requests-6] 2023-06-26 06:12:13,165 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
INFO  [MigrationStage:1] 2023-06-26 06:12:20,413 MigrationCoordinator.java:531 
- Sending schema pull request to /192.168.103.3 at 1687759940413 with timeout 
4287
INFO  [Native-Transport-Requests-3] 2023-06-26 06:12:21,270 
MigrationManager.java:286 - Update Keyspace 
'uuid1602df26aef747d6af8f6d0c36fb1a44' From 
KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], 
types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
 To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=3}}, tables=[], views=[], functions=[], 
types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:12:24,646 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
 jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 - 
Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q
INFO  [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 - 
Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
INFO  [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 - 
Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete
INFO  [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 - 
Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q
INFO  [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 - 
Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete
INFO  [Native-Transport-Requests-4] 2023-06-26 06:12:37,529 
MigrationManager.java:362 - Drop Keyspace 'uuidd17c2b7bfea545d0801a3c05e0e5307a'
INFO  [Native-Transport-Requests-1] 2023-06-26 06:13:07,622 
MigrationManager.java:313 - Update table 
'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From 
org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={max_threshold=32, min_threshold=4}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
 jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To 
org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={max_threshold=32, min_threshold=4}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs,
 fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8 
cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR, 
type=org.apache.cassandra.db.marshal.Int32Type, 
droppedTime=1687759987622000}},triggers=[],indexes=[]]
ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 - 
Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.AssertionError: Unknown keyspace uuidd17c2b7bfea545d0801a3c05e0e5307a
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
        at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
        at 
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at 
org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72)
        at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        at java.lang.Thread.run(Thread.java:750)
{code}
 

>From my understanding, this should be a concurrency bug. The ERROR message 
>comes from *_SizeEstimatesRecorder_* thread when it tries to init and iterate 
>the keyspaces to get the size information. This thread might be invoked from 
>time to time, controlled by DebuggableScheduledThreadPoolExecutor.

The error message occurs when there's a thread interleaving between
 # Thread1: *_SizeEstimatesRecorder_* thread and
 # Thread2: Thread performing {_}drop keyspace{_}. (mutation or migration 
thread)

In short, thread1 fetches the keyspace names => thread2 updates the schema by 
removing a specific keyspace => thread1 tries to fetch the schema from the 
dropped keyspace, and error message is shown.

 

More specifically, here's how it happens from my preliminary analysis:

(1) Thread1 tries to iterate all the keyspaces from
{code:java}
public void run()
{
    TokenMetadata metadata = 
StorageService.instance.getTokenMetadata().cloneOnlyTokenMap();
    if (!metadata.isMember(FBUtilities.getBroadcastAddress()))
    {
        logger.debug("Node is not part of the ring; not recording size 
estimates");
        return;
    }    logger.trace("Recording size estimates");    for (Keyspace keyspace : 
Keyspace.nonLocalStrategy())
    {{code}
It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in 
_nonLocalStrategy_ function
{code:java}
public static Iterable<Keyspace> nonLocalStrategy()
{
    return Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(), 
keyspaceTransformer);
}
{code}
getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current 
schema.
{code:java}
/**
 * @return a collection of keyspaces that do not use LocalStrategy for 
replication
 */
public List<String> getNonLocalStrategyKeyspaces()
{
    return keyspaces.values().stream()
            .filter(keyspace -> keyspace.params.replication.klass != 
LocalStrategy.class)
            .map(keyspace -> keyspace.name)
            .collect(Collectors.toList());
} {code}
(2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}. 

(3) Thread1: for each keyspace name, it will invoke the <init> function of 
Keyspace, when it tries to fetch the metadata from the dropped keyspace name, 
it gets null and throws and ERROR exception.
{code:java}
private Keyspace(String keyspaceName, boolean loadSSTables)
{
    metadata = Schema.instance.getKSMetaData(keyspaceName);
    assert metadata != null : "Unknown keyspace " + keyspaceName;
    createReplicationStrategy(metadata); {code}
 

This error message won't stop Cassandra from running. But I think this could be 
handled considering its ERROR level. E.g. if a user wants to perform a 
migration from 3.x to 4.x, [Cassandra migration 
doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x]
 mentions that "All errors need to be handled". This error message could 
confuse users about whether the data is still in an integrated state.

  was:
When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is an 
ERROR log message in system.log after drop keyspace command is executed. 
{code:java}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:48,575 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=2}}, tables=[], views=[], functions=[], types=[]}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:53,106 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
INFO  [Native-Transport-Requests-2] 2023-06-26 06:11:56,673 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=50.0ms, caching={'keys' : 'ALL', 
'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
[condcduzca o ukwhjwbdljqnz 
vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx,
 condcduzca, vzkzguyh, o, 
ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 - 
Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
INFO  [Native-Transport-Requests-9] 2023-06-26 06:12:04,634 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=ALWAYS, caching={'keys' : 'ALL', 
'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
[dzwqnsv gveohmecfdwx o 
ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv,
 vzkzguyh, gveohmecfdwx, o, 
ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 - 
Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o
INFO  [Native-Transport-Requests-6] 2023-06-26 06:12:13,165 
MigrationManager.java:205 - Create new Keyspace: 
KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
INFO  [MigrationStage:1] 2023-06-26 06:12:20,413 MigrationCoordinator.java:531 
- Sending schema pull request to /192.168.103.3 at 1687759940413 with timeout 
4287
INFO  [Native-Transport-Requests-3] 2023-06-26 06:12:21,270 
MigrationManager.java:286 - Update Keyspace 
'uuid1602df26aef747d6af8f6d0c36fb1a44' From 
KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=1}}, tables=[], views=[], functions=[], 
types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
 To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
params=KeyspaceParams{durable_writes=true, 
replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
 replication_factor=3}}, tables=[], views=[], functions=[], 
types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
INFO  [Native-Transport-Requests-1] 2023-06-26 06:12:24,646 
MigrationManager.java:235 - Create new table: 
org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
 jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]]
INFO  [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 - 
Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q
INFO  [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 - 
Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
INFO  [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 - 
Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete
INFO  [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 - 
Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q
INFO  [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 - 
Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete
INFO  [Native-Transport-Requests-4] 2023-06-26 06:12:37,529 
MigrationManager.java:362 - Drop Keyspace 'uuidd17c2b7bfea545d0801a3c05e0e5307a'
INFO  [Native-Transport-Requests-1] 2023-06-26 06:13:07,622 
MigrationManager.java:313 - Update table 
'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From 
org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={max_threshold=32, min_threshold=4}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
 jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To 
org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
 read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, 
max_index_interval=2048, speculative_retry=40PERCENTILE, caching={'keys' : 
'ALL', 'rows_per_partition' : 'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={max_threshold=32, min_threshold=4}}, 
compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
extensions={}, 
cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
 | 
[]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs,
 fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8 
cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR, 
type=org.apache.cassandra.db.marshal.Int32Type, 
droppedTime=1687759987622000}},triggers=[],indexes=[]]
ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 - 
Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.AssertionError: Unknown keyspace uuidd17c2b7bfea545d0801a3c05e0e5307a
        at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
        at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
        at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
        at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
        at 
com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
        at 
org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72)
        at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
        at java.lang.Thread.run(Thread.java:750)
{code}
 

>From my understanding, this should be a concurrency bug. The ERROR message 
>comes from *_SizeEstimatesRecorder_* thread when it tries to init and iterate 
>the keyspaces to get the size information. This thread might be invoked from 
>time to time, controlled by DebuggableScheduledThreadPoolExecutor.

The error message occurs when there's a thread interleaving between
 # Thread1: *_SizeEstimatesRecorder_* thread and
 # Thread2: Thread performing {_}drop keyspace{_}. (mutation or migration 
thread)

In short, thread1 fetches the keyspace names => thread2 updates the schema by 
removing a specific keyspace => thread1 tries to fetch the schema from the 
dropped keyspace, and error message is shown.

 

More specifically, here's how it happens from my preliminary analysis:

(1) Thread1 tries to iterate all the keyspaces from
{code:java}
public void run()
{
    TokenMetadata metadata = 
StorageService.instance.getTokenMetadata().cloneOnlyTokenMap();
    if (!metadata.isMember(FBUtilities.getBroadcastAddress()))
    {
        logger.debug("Node is not part of the ring; not recording size 
estimates");
        return;
    }    logger.trace("Recording size estimates");    for (Keyspace keyspace : 
Keyspace.nonLocalStrategy())
    {{code}
It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in 
_nonLocalStrategy_ function
{code:java}
    public static Iterable<Keyspace> nonLocalStrategy()    {        return 
Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(), 
keyspaceTransformer);    } {code}
getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current 
schema.
{code:java}
/**
 * @return a collection of keyspaces that do not use LocalStrategy for 
replication
 */
public List<String> getNonLocalStrategyKeyspaces()
{
    return keyspaces.values().stream()
            .filter(keyspace -> keyspace.params.replication.klass != 
LocalStrategy.class)
            .map(keyspace -> keyspace.name)
            .collect(Collectors.toList());
} {code}
(2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}. 

(3) Thread1: for each keyspace name, it will invoke the <init> function of 
Keyspace, when it tries to fetch the metadata from the dropped keyspace name, 
it gets null and throws and ERROR exception.
{code:java}
private Keyspace(String keyspaceName, boolean loadSSTables)
{
    metadata = Schema.instance.getKSMetaData(keyspaceName);
    assert metadata != null : "Unknown keyspace " + keyspaceName;
    createReplicationStrategy(metadata); {code}
 

This error message won't stop Cassandra from running. But I think this could be 
handled considering its ERROR level. E.g. if a user wants to perform a 
migration from 3.x to 4.x, [Cassandra migration 
doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x]
 mentions that "All errors need to be handled". This error message could 
confuse users about whether the data is still in an integrated state.


> AssertionError: Unknown keyspace after dropping a keyspace
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-18636
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18636
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ke Han
>            Priority: Normal
>
> When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is 
> an ERROR log message in system.log after drop keyspace command is executed. 
> {code:java}
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:48,575 
> MigrationManager.java:205 - Create new Keyspace: 
> KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=2}}, tables=[], views=[], functions=[], types=[]}
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:53,106 
> MigrationManager.java:205 - Create new Keyspace: 
> KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
> INFO  [Native-Transport-Requests-2] 2023-06-26 06:11:56,673 
> MigrationManager.java:235 - Create new table: 
> org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, speculative_retry=50.0ms, 
> caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
> [condcduzca o ukwhjwbdljqnz 
> vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx,
>  condcduzca, vzkzguyh, o, 
> ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 - 
> Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
> INFO  [Native-Transport-Requests-9] 2023-06-26 06:12:04,634 
> MigrationManager.java:235 - Create new table: 
> org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, speculative_retry=ALWAYS, 
> caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
> [dzwqnsv gveohmecfdwx o 
> ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv,
>  vzkzguyh, gveohmecfdwx, o, 
> ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 - 
> Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o
> INFO  [Native-Transport-Requests-6] 2023-06-26 06:12:13,165 
> MigrationManager.java:205 - Create new Keyspace: 
> KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
> INFO  [MigrationStage:1] 2023-06-26 06:12:20,413 
> MigrationCoordinator.java:531 - Sending schema pull request to /192.168.103.3 
> at 1687759940413 with timeout 4287
> INFO  [Native-Transport-Requests-3] 2023-06-26 06:12:21,270 
> MigrationManager.java:286 - Update Keyspace 
> 'uuid1602df26aef747d6af8f6d0c36fb1a44' >From 
> KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=1}}, tables=[], views=[], functions=[], 
> types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
>  To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=3}}, tables=[], views=[], functions=[], 
> types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:12:24,646 
> MigrationManager.java:235 - Create new table: 
> org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, 
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
>  | 
> [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
>  jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 - 
> Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q
> INFO  [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 - 
> Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
> INFO  [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 - 
> Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete
> INFO  [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 - 
> Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q
> INFO  [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 - 
> Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete
> INFO  [Native-Transport-Requests-4] 2023-06-26 06:12:37,529 
> MigrationManager.java:362 - Drop Keyspace 
> 'uuidd17c2b7bfea545d0801a3c05e0e5307a'
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:13:07,622 
> MigrationManager.java:313 - Update table 
> 'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From 
> org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={max_threshold=32, min_threshold=4}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, 
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
>  | 
> [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
>  jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To 
> org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={max_threshold=32, min_threshold=4}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, 
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
>  | 
> []],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs,
>  fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8 
> cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR, 
> type=org.apache.cassandra.db.marshal.Int32Type, 
> droppedTime=1687759987622000}},triggers=[],indexes=[]]
> ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 - 
> Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.AssertionError: Unknown keyspace 
> uuidd17c2b7bfea545d0801a3c05e0e5307a
>         at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
>         at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
>         at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
>         at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
>         at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>         at 
> org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72)
>         at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>         at java.lang.Thread.run(Thread.java:750)
> {code}
>  
> From my understanding, this should be a concurrency bug. The ERROR message 
> comes from *_SizeEstimatesRecorder_* thread when it tries to init and iterate 
> the keyspaces to get the size information. This thread might be invoked from 
> time to time, controlled by DebuggableScheduledThreadPoolExecutor.
> The error message occurs when there's a thread interleaving between
>  # Thread1: *_SizeEstimatesRecorder_* thread and
>  # Thread2: Thread performing {_}drop keyspace{_}. (mutation or migration 
> thread)
> In short, thread1 fetches the keyspace names => thread2 updates the schema by 
> removing a specific keyspace => thread1 tries to fetch the schema from the 
> dropped keyspace, and error message is shown.
>  
> More specifically, here's how it happens from my preliminary analysis:
> (1) Thread1 tries to iterate all the keyspaces from
> {code:java}
> public void run()
> {
>     TokenMetadata metadata = 
> StorageService.instance.getTokenMetadata().cloneOnlyTokenMap();
>     if (!metadata.isMember(FBUtilities.getBroadcastAddress()))
>     {
>         logger.debug("Node is not part of the ring; not recording size 
> estimates");
>         return;
>     }    logger.trace("Recording size estimates");    for (Keyspace keyspace 
> : Keyspace.nonLocalStrategy())
>     {{code}
> It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in 
> _nonLocalStrategy_ function
> {code:java}
> public static Iterable<Keyspace> nonLocalStrategy()
> {
>     return 
> Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(), 
> keyspaceTransformer);
> }
> {code}
> getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current 
> schema.
> {code:java}
> /**
>  * @return a collection of keyspaces that do not use LocalStrategy for 
> replication
>  */
> public List<String> getNonLocalStrategyKeyspaces()
> {
>     return keyspaces.values().stream()
>             .filter(keyspace -> keyspace.params.replication.klass != 
> LocalStrategy.class)
>             .map(keyspace -> keyspace.name)
>             .collect(Collectors.toList());
> } {code}
> (2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}. 
> (3) Thread1: for each keyspace name, it will invoke the <init> function of 
> Keyspace, when it tries to fetch the metadata from the dropped keyspace name, 
> it gets null and throws and ERROR exception.
> {code:java}
> private Keyspace(String keyspaceName, boolean loadSSTables)
> {
>     metadata = Schema.instance.getKSMetaData(keyspaceName);
>     assert metadata != null : "Unknown keyspace " + keyspaceName;
>     createReplicationStrategy(metadata); {code}
>  
> This error message won't stop Cassandra from running. But I think this could 
> be handled considering its ERROR level. E.g. if a user wants to perform a 
> migration from 3.x to 4.x, [Cassandra migration 
> doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x]
>  mentions that "All errors need to be handled". This error message could 
> confuse users about whether the data is still in an integrated state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org


Reply via email to