[ 
https://issues.apache.org/jira/browse/CASSANDRA-18963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788719#comment-17788719
 ] 

Jacek Lewandowski commented on CASSANDRA-18963:
-----------------------------------------------

More details:

Why {{prepared_statements}}? According to the ticket mentioned above, the 
system tables should go to the first data location unless a specific data 
location for those sstables was specified. However, it turns out that it does 
not apply to some system tables, in particular those mentioned in 
{{SystemKeyspace.TABLES_SPLIT_ACROSS_MULTIPLE_DISKS}}, namely: batches, paxos, 
compaction_history, *prepared_statements* and repairs. {{prepared_statements}} 
is probably the only one extensively populated during a normal run thus have 
something to compact. This explains why there were sstables of that table 
spread across multiple data dirs, and why they were being compacted.

Now, why those sstables are moved on startup although they should be excluded 
from the procedure introduced by CASSANDRA-14793? There seem to be an 
oversight/mistake in filtering tables which should be moved:

{code:java}
                        Path[] tableDirectories = 
keyspaceChildren.filter(Files::isDirectory)
                                                                  .filter(p -> 
!SystemKeyspace.TABLES_SPLIT_ACROSS_MULTIPLE_DISKS
                                                                                
              .contains(p.getFileName()
                                                                                
                         .toString()))
                                                                  
.toArray(Path[]::new);
{code}

The directory name of the table obviously includes the table id so no item from 
the set matches it. Therefore sstables of all the tables are moved.

So we should fix two things in this ticket:
- fix the filtering (4.0 and on)
- fix the way how the files are moved - we should skip sstables which are a 
part of an incomplete transaction - this will be removed automatically anyway


> Test Failure: sstableutil_test.TestSSTableUtil
> ----------------------------------------------
>
>                 Key: CASSANDRA-18963
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18963
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Berenguer Blasi
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 4.1.x, 5.0-rc
>
>
> Seems flaky 
> [https://app.circleci.com/pipelines/github/driftx/cassandra/1342/workflows/9554c7b5-dd60-4f08-8db5-c954febc8ad6/jobs/58957/tests]
>  * 
> h4. test_abortedcompaction
> sstableutil_test.TestSSTableUtil 
>  
> {code:java}
> self = <sstableutil_test.TestSSTableUtil object at 0x7fc099ecea58> def 
> test_abortedcompaction(self): """ @jira_ticket CASSANDRA-7066 @jira_ticket 
> CASSANDRA-11497 Check that we can cleanup temporary files after a compaction 
> is aborted. """ log_file_name = 'debug.log' cluster = self.cluster 
> cluster.populate(1).start() node = cluster.nodelist()[0] numrecords = 250000 
> self._create_data(node, KeyspaceName, TableName, numrecords) finalfiles, 
> tmpfiles = self._check_files(node, KeyspaceName, TableName) assert 
> len(finalfiles) > 0, "Expected to find some final files" assert 0 == 
> len(tmpfiles), "Expected no tmp files" t = InterruptCompaction(node, 
> TableName, filename=log_file_name, delay=2) t.start() try: 
> logger.debug("Compacting...") node.compact() except ToolError: pass # 
> expected to fail t.join() finalfiles = 
> _normcase_all(self._invoke_sstableutil(KeyspaceName, TableName, 
> type='final')) tmpfiles = 
> _normcase_all(self._invoke_sstableutil(KeyspaceName, TableName, type='tmp')) 
> # In most cases we should end up with some temporary files to clean up, but 
> it may happen # that no temporary files are created if compaction finishes 
> too early or starts too late # see CASSANDRA-11497 logger.debug("Got {} final 
> files and {} tmp files after compaction was interrupted" 
> .format(len(finalfiles), len(tmpfiles))) 
> self._invoke_sstableutil(KeyspaceName, TableName, cleanup=True) 
> self._check_files(node, KeyspaceName, TableName, finalfiles, []) # restart to 
> make sure not data is lost logger.debug("Restarting node...") > 
> node.start(wait_for_binary_proto=True) sstableutil_test.py:97: _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:914: in start 
> self.wait_for_binary_interface(from_mark=self.mark) 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:702: in 
> wait_for_binary_interface self.watch_log_for("Starting listening for CQL 
> clients", **kwargs) ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:599: 
> in watch_log_for self.raise_node_error_if_cassandra_process_is_terminated() _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> self = <ccmlib.node.Node object at 0x7fc08a583be0> def 
> raise_node_error_if_cassandra_process_is_terminated(self): if not 
> self._is_pid_running(): msg = "C* process with {pid} is 
> terminated".format(pid=self.pid) common.debug(msg) > raise NodeError(msg) E 
> ccmlib.node.NodeError: C* process with 19530 is terminated 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:683: NodeError{code}
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node1] 'ERROR [SSTableBatchOpen:1] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-4-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,450 SSTableReader.java:419 - Corrupt sstable 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big=[Index.db,
>  Data.db]; skipping 
> table\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted']" Unexpected error found in node logs (see stdout 
> for full details). Errors: [[node1] 'ERROR [SSTableBatchOpen:1] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-4-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,450 SSTableReader.java:419 - Corrupt sstable 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big=[Index.db,
>  Data.db]; skipping 
> table\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted']{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to