[ 
https://issues.apache.org/jira/browse/CASSANDRA-18636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765898#comment-17765898
 ] 

Ke Han commented on CASSANDRA-18636:
------------------------------------

Hi [~smiklosovic], I found another possible thread interleaving 
(CASSANDRA-18858) which could lead to a similar error message: Interleaving 
between thread updating the metadata and thread performing drain.

I think your patch might be able to fix both of them if exhausting the other 
possible references to the old copy of keyspaces.

 

> AssertionError: Unknown keyspace after dropping a keyspace
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-18636
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18636
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ke Han
>            Priority: Normal
>         Attachments: system.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running Cassandra cluster (3 nodes) in 3.11.15, I noticed that there is 
> an ERROR message in system.log after drop keyspace command is executed. 
> {code:java}
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:48,575 
> MigrationManager.java:205 - Create new Keyspace: 
> KeyspaceMetadata{name=uuide1b80964f06a424c8a57884564127745, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=2}}, tables=[], views=[], functions=[], types=[]}
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:11:53,106 
> MigrationManager.java:205 - Create new Keyspace: 
> KeyspaceMetadata{name=uuidd17c2b7bfea545d0801a3c05e0e5307a, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
> INFO  [Native-Transport-Requests-2] 2023-06-26 06:11:56,673 
> MigrationManager.java:235 - Create new table: 
> org.apache.cassandra.config.CFMetaData@705b690b[cfId=5a7fcf10-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=vzkzguyh,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, speculative_retry=50.0ms, 
> caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
> [condcduzca o ukwhjwbdljqnz 
> vzkzguyh]],partitionKeyColumns=[gveohmecfdwx],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UTF8Type,columnMetadata=[gveohmecfdwx,
>  condcduzca, vzkzguyh, o, 
> ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-06-26 06:12:03,118 ColumnFamilyStore.java:432 - 
> Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
> INFO  [Native-Transport-Requests-9] 2023-06-26 06:12:04,634 
> MigrationManager.java:235 - Create new table: 
> org.apache.cassandra.config.CFMetaData@61f10dd7[cfId=5f3e8fa0-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuidd17c2b7bfea545d0801a3c05e0e5307a,cfName=o,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, speculative_retry=ALWAYS, 
> caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, cdc=false},comparator=comparator(),partitionColumns=[[] | 
> [dzwqnsv gveohmecfdwx o 
> ukwhjwbdljqnz]],partitionKeyColumns=[vzkzguyh],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[dzwqnsv,
>  vzkzguyh, gveohmecfdwx, o, 
> ukwhjwbdljqnz],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-06-26 06:12:11,862 ColumnFamilyStore.java:432 - 
> Initializing uuidd17c2b7bfea545d0801a3c05e0e5307a.o
> INFO  [Native-Transport-Requests-6] 2023-06-26 06:12:13,165 
> MigrationManager.java:205 - Create new Keyspace: 
> KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=1}}, tables=[], views=[], functions=[], types=[]}
> INFO  [MigrationStage:1] 2023-06-26 06:12:20,413 
> MigrationCoordinator.java:531 - Sending schema pull request to /192.168.103.3 
> at 1687759940413 with timeout 4287
> INFO  [Native-Transport-Requests-3] 2023-06-26 06:12:21,270 
> MigrationManager.java:286 - Update Keyspace 
> 'uuid1602df26aef747d6af8f6d0c36fb1a44' >From 
> KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=1}}, tables=[], views=[], functions=[], 
> types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
>  To KeyspaceMetadata{name=uuid1602df26aef747d6af8f6d0c36fb1a44, 
> params=KeyspaceParams{durable_writes=true, 
> replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy,
>  replication_factor=3}}, tables=[], views=[], functions=[], 
> types=[org.apache.cassandra.db.marshal.UserType(uuid1602df26aef747d6af8f6d0c36fb1a44,756c76796163,71:org.apache.cassandra.db.marshal.UTF8Type,756c76796163:org.apache.cassandra.db.marshal.Int32Type)]}
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:12:24,646 
> MigrationManager.java:235 - Create new table: 
> org.apache.cassandra.config.CFMetaData@132ca1e[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, 
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
>  | 
> [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
>  jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]]
> INFO  [MigrationStage:1] 2023-06-26 06:12:31,863 ColumnFamilyStore.java:432 - 
> Initializing uuid1602df26aef747d6af8f6d0c36fb1a44.q
> INFO  [MutationStage-5] 2023-06-26 06:12:32,571 ColumnFamilyStore.java:2250 - 
> Truncating uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh
> INFO  [MutationStage-5] 2023-06-26 06:12:35,128 ColumnFamilyStore.java:2304 - 
> Truncate of uuidd17c2b7bfea545d0801a3c05e0e5307a.vzkzguyh is complete
> INFO  [MutationStage-4] 2023-06-26 06:12:35,134 ColumnFamilyStore.java:2250 - 
> Truncating uuid1602df26aef747d6af8f6d0c36fb1a44.q
> INFO  [MutationStage-4] 2023-06-26 06:12:37,524 ColumnFamilyStore.java:2304 - 
> Truncate of uuid1602df26aef747d6af8f6d0c36fb1a44.q is complete
> INFO  [Native-Transport-Requests-4] 2023-06-26 06:12:37,529 
> MigrationManager.java:362 - Drop Keyspace 
> 'uuidd17c2b7bfea545d0801a3c05e0e5307a'
> INFO  [Native-Transport-Requests-1] 2023-06-26 06:13:07,622 
> MigrationManager.java:313 - Update table 
> 'uuid1602df26aef747d6af8f6d0c36fb1a44/q' From 
> org.apache.cassandra.config.CFMetaData@602b98fd[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={max_threshold=32, min_threshold=4}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, 
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
>  | 
> [xwtypuqy]],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[xwtypuqy,
>  jsxcfdobqbmirwxs, fkbelwg],droppedColumns={},triggers=[],indexes=[]] To 
> org.apache.cassandra.config.CFMetaData@cfd4a9a[cfId=6b2c2660-13e8-11ee-9cb7-e10fdc7464a5,ksName=uuid1602df26aef747d6af8f6d0c36fb1a44,cfName=q,flags=[COMPOUND],params=TableParams{comment=,
>  read_repair_chance=0.0, dclocal_read_repair_chance=0.1, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, 
> default_time_to_live=0, memtable_flush_period_in_ms=0, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=40PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={max_threshold=32, min_threshold=4}}, 
> compression=org.apache.cassandra.schema.CompressionParams@a8f0c342, 
> extensions={}, 
> cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.UTF8Type),partitionColumns=[[]
>  | 
> []],partitionKeyColumns=[fkbelwg],clusteringColumns=[jsxcfdobqbmirwxs],keyValidator=org.apache.cassandra.db.marshal.Int32Type,columnMetadata=[jsxcfdobqbmirwxs,
>  fkbelwg],droppedColumns={java.nio.HeapByteBuffer[pos=0 lim=8 
> cap=8]=DroppedColumn{name=xwtypuqy, kind=REGULAR, 
> type=org.apache.cassandra.db.marshal.Int32Type, 
> droppedTime=1687759987622000}},triggers=[],indexes=[]]
> ERROR [OptionalTasks:1] 2023-06-26 06:13:14,070 CassandraDaemon.java:244 - 
> Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.AssertionError: Unknown keyspace 
> uuidd17c2b7bfea545d0801a3c05e0e5307a
>         at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
>         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
>         at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:92)
>         at org.apache.cassandra.db.Keyspace$1.apply(Keyspace.java:89)
>         at com.google.common.collect.Iterators$8.transform(Iterators.java:799)
>         at 
> com.google.common.collect.TransformedIterator.next(TransformedIterator.java:48)
>         at 
> org.apache.cassandra.db.SizeEstimatesRecorder.run(SizeEstimatesRecorder.java:72)
>         at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>         at java.lang.Thread.run(Thread.java:750)
> {code}
>  
> From my understanding, this is a concurrency bug. The ERROR message comes 
> from *_SizeEstimatesRecorder_* thread when it tries to init and iterate the 
> keyspaces to get the size information. This thread might be invoked from time 
> to time, controlled by DebuggableScheduledThreadPoolExecutor.
> The error message occurs when there's a thread interleaving between
>  # Thread1: *_SizeEstimatesRecorder_* thread and
>  # Thread2: Thread performing {_}drop keyspace{_}. (mutation or migration 
> thread)
> In short, thread1 fetches the keyspace names => thread2 updates the schema by 
> removing a specific keyspace => thread1 tries to fetch the schema from the 
> dropped keyspace, and error message is shown.
>  
> More specifically, here's how it happens from my preliminary analysis:
> (1) Thread1 tries to iterate all the keyspaces from
> {code:java}
> public void run()
> {
>     TokenMetadata metadata = 
> StorageService.instance.getTokenMetadata().cloneOnlyTokenMap();
>     if (!metadata.isMember(FBUtilities.getBroadcastAddress()))
>     {
>         logger.debug("Node is not part of the ring; not recording size 
> estimates");
>         return;
>     }    logger.trace("Recording size estimates");    
>     for (Keyspace keyspace : Keyspace.nonLocalStrategy())
>     {{code}
> It invokes _Schema.instance.getNonLocalStrategyKeyspaces()_ in 
> _nonLocalStrategy_ function
> {code:java}
> public static Iterable<Keyspace> nonLocalStrategy()
> {
>     return 
> Iterables.transform(Schema.instance.getNonLocalStrategyKeyspaces(), 
> keyspaceTransformer);
> }
> {code}
> getNonLocalStrategyKeyspaces returns a list of *keyspace names* from current 
> schema.
> {code:java}
> /**
>  * @return a collection of keyspaces that do not use LocalStrategy for 
> replication
>  */
> public List<String> getNonLocalStrategyKeyspaces()
> {
>     return keyspaces.values().stream()
>             .filter(keyspace -> keyspace.params.replication.klass != 
> LocalStrategy.class)
>             .map(keyspace -> keyspace.name)
>             .collect(Collectors.toList());
> } {code}
> (2) Thread2 performs the _drop keyspace_ and {_}updates the schema{_}. 
> (3) Thread1: for each keyspace name, it will invoke the <init> function of 
> Keyspace, when it tries to fetch the metadata from the dropped keyspace name, 
> it gets null and throws and ERROR exception.
> {code:java}
> private Keyspace(String keyspaceName, boolean loadSSTables)
> {
>     metadata = Schema.instance.getKSMetaData(keyspaceName);
>     assert metadata != null : "Unknown keyspace " + keyspaceName;
>     createReplicationStrategy(metadata); {code}
>  
> This error message won't stop Cassandra from running. But I think this could 
> be handled considering its ERROR level. E.g. if a user wants to perform a 
> migration from 3.x to 4.x, [Cassandra migration 
> doc|https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x]
>  mentions that "All errors need to be handled". This error message could 
> confuse users about whether the data is still in an integrated state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to