[ 
https://issues.apache.org/jira/browse/CASSANDRA-18963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17788464#comment-17788464
 ] 

Jacek Lewandowski commented on CASSANDRA-18963:
-----------------------------------------------

It looks like that the problem is caused by the fact we try to open an sstable 
which misses STATS component. We can see that from "moving..." log entries - 
when the node starts, it moves sstables of the system keyspace from non-first 
to the first data location - thanks to that fact I noticed that there is no 
STATS file moved for the problematic table. 

Another observation is that, the table in question was being compacted when the 
compactions were interrupted and the process was stopped. I'd expect that the 
incomplete sstable is marked as such in the transaction log file and then it is 
skipped during scan when the node restarts. I'd like to try that scenario in 
the isolated test - see the state of the transaction log before the stats 
component is written, and then, if such transaction log prevents such sstable 
from loading.


> Test Failure: sstableutil_test.TestSSTableUtil
> ----------------------------------------------
>
>                 Key: CASSANDRA-18963
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18963
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Berenguer Blasi
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 4.1.x, 5.0-beta
>
>
> Seems flaky 
> [https://app.circleci.com/pipelines/github/driftx/cassandra/1342/workflows/9554c7b5-dd60-4f08-8db5-c954febc8ad6/jobs/58957/tests]
>  * 
> h4. test_abortedcompaction
> sstableutil_test.TestSSTableUtil 
>  
> {code:java}
> self = <sstableutil_test.TestSSTableUtil object at 0x7fc099ecea58> def 
> test_abortedcompaction(self): """ @jira_ticket CASSANDRA-7066 @jira_ticket 
> CASSANDRA-11497 Check that we can cleanup temporary files after a compaction 
> is aborted. """ log_file_name = 'debug.log' cluster = self.cluster 
> cluster.populate(1).start() node = cluster.nodelist()[0] numrecords = 250000 
> self._create_data(node, KeyspaceName, TableName, numrecords) finalfiles, 
> tmpfiles = self._check_files(node, KeyspaceName, TableName) assert 
> len(finalfiles) > 0, "Expected to find some final files" assert 0 == 
> len(tmpfiles), "Expected no tmp files" t = InterruptCompaction(node, 
> TableName, filename=log_file_name, delay=2) t.start() try: 
> logger.debug("Compacting...") node.compact() except ToolError: pass # 
> expected to fail t.join() finalfiles = 
> _normcase_all(self._invoke_sstableutil(KeyspaceName, TableName, 
> type='final')) tmpfiles = 
> _normcase_all(self._invoke_sstableutil(KeyspaceName, TableName, type='tmp')) 
> # In most cases we should end up with some temporary files to clean up, but 
> it may happen # that no temporary files are created if compaction finishes 
> too early or starts too late # see CASSANDRA-11497 logger.debug("Got {} final 
> files and {} tmp files after compaction was interrupted" 
> .format(len(finalfiles), len(tmpfiles))) 
> self._invoke_sstableutil(KeyspaceName, TableName, cleanup=True) 
> self._check_files(node, KeyspaceName, TableName, finalfiles, []) # restart to 
> make sure not data is lost logger.debug("Restarting node...") > 
> node.start(wait_for_binary_proto=True) sstableutil_test.py:97: _ _ _ _ _ _ _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:914: in start 
> self.wait_for_binary_interface(from_mark=self.mark) 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:702: in 
> wait_for_binary_interface self.watch_log_for("Starting listening for CQL 
> clients", **kwargs) ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:599: 
> in watch_log_for self.raise_node_error_if_cassandra_process_is_terminated() _ 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> self = <ccmlib.node.Node object at 0x7fc08a583be0> def 
> raise_node_error_if_cassandra_process_is_terminated(self): if not 
> self._is_pid_running(): msg = "C* process with {pid} is 
> terminated".format(pid=self.pid) common.debug(msg) > raise NodeError(msg) E 
> ccmlib.node.NodeError: C* process with 19530 is terminated 
> ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:683: NodeError{code}
> {code:java}
> failed on teardown with "Unexpected error found in node logs (see stdout for 
> full details). Errors: [[node1] 'ERROR [SSTableBatchOpen:1] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-4-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,450 SSTableReader.java:419 - Corrupt sstable 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big=[Index.db,
>  Data.db]; skipping 
> table\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted']" Unexpected error found in node logs (see stdout 
> for full details). Errors: [[node1] 'ERROR [SSTableBatchOpen:1] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-4-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,449 DefaultFSErrorHandler.java:129 - Exiting forcefully due to file 
> system exception on startup, disk failure policy 
> "stop"\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted', [node1] 'ERROR [SSTableBatchOpen:4] 2023-10-19 
> 22:25:07,450 SSTableReader.java:419 - Corrupt sstable 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big=[Index.db,
>  Data.db]; skipping 
> table\norg.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /tmp/dtest-ahz16wrh/test/node1/data0/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/nc-5-big\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:111)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:397)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:353)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReader.lambda$openAll$4(SSTableReader.java:414)\n\tat
>  org.apache.cassandra.concurrent.FutureTask$1.call(FutureTask.java:96)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)\n\tat 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)\n\tat 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
>  
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
>  
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat
>  java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
> java.lang.IllegalArgumentException: null\n\tat 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:129)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:79)\n\tat
>  
> org.apache.cassandra.io.sstable.format.big.BigSSTableReaderLoadingBuilder.openComponents(BigSSTableReaderLoadingBuilder.java:58)\n\tat
>  
> org.apache.cassandra.io.sstable.format.SSTableReaderLoadingBuilder.build(SSTableReaderLoadingBuilder.java:92)\n\t...
>  10 common frames omitted']{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to