[ https://issues.apache.org/jira/browse/CASSANDRA-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765898#comment-17765898 ]
Ke Han commented on CASSANDRA-18636: ------------------------------------ Hi [~smiklosovic], I found another possible thread interleaving (CASSANDRA-18858) which could lead to a similar error message: Interleaving between thread updating the metadata and thread performing drain. I think your patch might be able to fix both of them if exhausting the other possible references to the old copy of keyspaces. > AssertionError: Unknown keyspace after dropping a keyspace > ---------------------------------------------------------- > > Key: CASSANDRA-18636 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18636 > Project: Cassandra > Issue Type: Bug > Reporter: Ke Han > Priority: Normal > Attachments: system.log > > Time Spent: 10m > Remaining Estimate: 0h > > When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is > an ERROR message in system.log after drop keyspace command is executed. > {code:java} > INFO [Native-Transport-Requests-1] 2023-06-26 06:11:48,575 > MigrationManager.java:205 - Create new Keyspace: > KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745, > params=KeyspaceParams{durable_writes=true, > replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, > replication_factor=2}}, tables=[], views=[], functions=[], types=[]} > INFO [Native-Transport-Requests-1] 2023-06-26 06:11:53,106 > MigrationManager.java:205 - Create new Keyspace: > KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a, > params=KeyspaceParams{durable_writes=true, > replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, > replication_factor=1}}, tables=[], views=[], functions=[], types=[]} > INFO [Native-Transport-Requests-2] 2023-06-26 06:11:56,673 > MigrationManager.java:235 - Create new table: > org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=, > read_repair_chance=0.0, dclocal_read_repair_chance=0.1, > bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, > default_time_to_live=0, memtable_flush_period_in_ms=0, > min_index_interval=128, max_index_interval=2048, speculative_retry=50.0ms, > caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, > compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, > options={min_threshold=4, max_threshold=32}}, > compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, > extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | > [condcduzca o ukwhjwbdljqnz > vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx, > condcduzca, vzkzguyh, o, > ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]] > INFO [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 - > Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh > INFO [Native-Transport-Requests-9] 2023-06-26 06:12:04,634 > MigrationManager.java:235 - Create new table: > org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=, > read_repair_chance=0.0, dclocal_read_repair_chance=0.1, > bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, > default_time_to_live=0, memtable_flush_period_in_ms=0, > min_index_interval=128, max_index_interval=2048, speculative_retry=ALWAYS, > caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, > compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, > options={min_threshold=4, max_threshold=32}}, > compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, > extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | > [dzwqnsv gveohmecfdwx o > ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv, > vzkzguyh, gveohmecfdwx, o, > ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]] > INFO [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 - > Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o > INFO [Native-Transport-Requests-6] 2023-06-26 06:12:13,165 > MigrationManager.java:205 - Create new Keyspace: > KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, > params=KeyspaceParams{durable_writes=true, > replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, > replication_factor=1}}, tables=[], views=[], functions=[], types=[]} > INFO [MigrationStage:1] 2023-06-26 06:12:20,413 > MigrationCoordinator.java:531 - Sending schema pull request to /192.168.103.3 > at 1687759940413 with timeout 4287 > INFO [Native-Transport-Requests-3] 2023-06-26 06:12:21,270 > MigrationManager.java:286 - Update Keyspace > 'uuid1602df26aef747d6af8f6d0c36fb1a44' >From > KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, > params=KeyspaceParams{durable_writes=true, > replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, > replication_factor=1}}, tables=[], views=[], functions=[], > types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]} > To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, > params=KeyspaceParams{durable_writes=true, > replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, > replication_factor=3}}, tables=[], views=[], functions=[], > types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]} > INFO [Native-Transport-Requests-1] 2023-06-26 06:12:24,646 > MigrationManager.java:235 - Create new table: > org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=, > read_repair_chance=0.0, dclocal_read_repair_chance=0.1, > bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, > default_time_to_live=0, memtable_flush_period_in_ms=0, > min_index_interval=128, max_index_interval=2048, > speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' > : 'NONE'}, > compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, > options={min_threshold=4, max_threshold=32}}, > compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, > extensions={}, > cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] > | > [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy, > jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] > INFO [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 - > Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q > INFO [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 - > Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh > INFO [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 - > Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete > INFO [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 - > Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q > INFO [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 - > Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete > INFO [Native-Transport-Requests-4] 2023-06-26 06:12:37,529 > MigrationManager.java:362 - Drop Keyspace > 'uuidd17c2b7bfea545d0801a3c05e0e5307a' > INFO [Native-Transport-Requests-1] 2023-06-26 06:13:07,622 > MigrationManager.java:313 - Update table > 'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From > org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=, > read_repair_chance=0.0, dclocal_read_repair_chance=0.1, > bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, > default_time_to_live=0, memtable_flush_period_in_ms=0, > min_index_interval=128, max_index_interval=2048, > speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' > : 'NONE'}, > compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, > options={max_threshold=32, min_threshold=4}}, > compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, > extensions={}, > cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] > | > [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy, > jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To > org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=, > read_repair_chance=0.0, dclocal_read_repair_chance=0.1, > bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, > default_time_to_live=0, memtable_flush_period_in_ms=0, > min_index_interval=128, max_index_interval=2048, > speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' > : 'NONE'}, > compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, > options={max_threshold=32, min_threshold=4}}, > compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, > extensions={}, > cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[] > | > []],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs, > fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8 > cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR, > type=org.apache.cassandra.db.marshal.Int32Type, > droppedTime=1687759987622000}},triggers=[],indexes=[]] > ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 - > Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.AssertionError: Unknown keyspace > uuidd17c2b7bfea545d0801a3c05e0e5307a > at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) > at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) > at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92) > at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89) > at com.google.common.collect.Iterators$8.transform(Iterators.java:799) > at > com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48) > at > org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) > at java.lang.Thread.run(Thread.java:750) > {code} > > From my understanding, this is a concurrency bug. The ERROR message comes > from *_SizeEstimatesRecorder_* thread when it tries to init and iterate the > keyspaces to get the size information. This thread might be invoked from time > to time, controlled by DebuggableScheduledThreadPoolExecutor. > The error message occurs when there's a thread interleaving between > # Thread1: *_SizeEstimatesRecorder_* thread and > # Thread2: Thread performing {_}drop keyspace{_}. (mutation or migration > thread) > In short, thread1 fetches the keyspace names => thread2 updates the schema by > removing a specific keyspace => thread1 tries to fetch the schema from the > dropped keyspace, and error message is shown. > > More specifically, here's how it happens from my preliminary analysis: > (1) Thread1 tries to iterate all the keyspaces from > {code:java} > public void run() > { > TokenMetadata metadata = > StorageService.instance.getTokenMetadata().cloneOnlyTokenMap(); > if (!metadata.isMember(FBUtilities.getBroadcastAddress())) > { > logger.debug("Node is not part of the ring; not recording size > estimates"); > return; > } logger.trace("Recording size estimates"); > for (Keyspace keyspace : Keyspace.nonLocalStrategy()) > {{code} > It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in > _nonLocalStrategy_ function > {code:java} > public static Iterable<Keyspace> nonLocalStrategy() > { > return > Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(), > keyspaceTransformer); > } > {code} > getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current > schema. > {code:java} > /** > * @return a collection of keyspaces that do not use LocalStrategy for > replication > */ > public List<String> getNonLocalStrategyKeyspaces() > { > return keyspaces.values().stream() > .filter(keyspace -> keyspace.params.replication.klass != > LocalStrategy.class) > .map(keyspace -> keyspace.name) > .collect(Collectors.toList()); > } {code} > (2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}. > (3) Thread1: for each keyspace name, it will invoke the <init> function of > Keyspace, when it tries to fetch the metadata from the dropped keyspace name, > it gets null and throws and ERROR exception. > {code:java} > private Keyspace(String keyspaceName, boolean loadSSTables) > { > metadata = Schema.instance.getKSMetaData(keyspaceName); > assert metadata != null : "Unknown keyspace " + keyspaceName; > createReplicationStrategy(metadata); {code} > > This error message won't stop Cassandra from running. But I think this could > be handled considering its ERROR level. E.g. if a user wants to perform a > migration from 3.x to 4.x, [Cassandra migration > doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x] > mentions that "All errors need to be handled". This error message could > confuse users about whether the data is still in an integrated state. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org