[ https://issues.apache.org/jira/browse/CASSANDRA-18119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17656282#comment-17656282 ]
Josh McKenzie commented on CASSANDRA-18119: ------------------------------------------- +1 on the code change. Looks like the trunk and 4.1 ci runs have failures in repeat runs on LogTransactionTest.java; the one on 4.1 [looks like it might be related|https://app.circleci.com/pipelines/github/krummas/cassandra/846/workflows/f9ff701a-22f6-463b-b356-c37a61d24a75/jobs/7054/tests] to this change here; it now chains down to {code:java} static boolean removeUnfinishedLeftovers(Map.Entry<String, List<File>> entry) { try(LogFile txn = LogFile.make(entry.getKey(), entry.getValue())) { logger.info("Verifying logfile transaction {}", txn); // We don't check / include the stats file timestamp on LogRecord creation / verification as that might // be modified by a race in compaction notification and then needlessly fail subsequent node starts. if (txn.verify(true)) // REVIEWER NOTE: this param leading to skipping TS calc on stats file is what we changed here{code} >From the logs on the test failure, looks like it was expecting true (files >found and deleted correctly) but received fals: {code:java} 34409 [junit-timeout] ERROR [main] 2022-12-13 15:45:01,465 LogFile.java:170 - Failed to read records for transaction log [nb_txn_compaction_1ae0fb60-7afd-11ed-bd8f-0b2a035c989d.log in build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1, build/test/cassandra/data/TransactionLogs 34410 [junit-timeout] ERROR [main] 2022-12-13 15:45:01,465 LogTransaction.java:561 - Unexpected disk state: failed to read transaction log [nb_txn_compaction_1ae0fb60-7afd-11ed-bd8f-0b2a035c989d.log in build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1, build/test/cassandra/da 34411 [junit-timeout] Files and contents follow:^M 34412 [junit-timeout] build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1/nb_txn_compaction_1ae0fb60-7afd-11ed-bd8f-0b2a035c989d.log^M 34413 [junit-timeout] REMOVE:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1/nb-0-big-,1670946301437,5][2873235910]^M 34414 [junit-timeout] REMOVE:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/2/nb-2-big-,1670946301441,5][1283732776]^M 34415 [junit-timeout] ADD:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1/nb-1-big-,0,5][1197593494]^M 34416 [junit-timeout] ADD:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/2/nb-3-big-,0,5][1374830355]^M 34417 [junit-timeout] COMMIT:[,0,0][2613697770]^M 34418 [junit-timeout] ***This record should have been the last one in all replicas^M 34419 [junit-timeout] build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/2/nb_txn_compaction_1ae0fb60-7afd-11ed-bd8f-0b2a035c989d.log^M 34420 [junit-timeout] REMOVE:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1/nb-0-big-,1670946301437,5][2873235910]^M 34421 [junit-timeout] REMOVE:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/2/nb-2-big-,1670946301441,5][1283732776]^M 34422 [junit-timeout] ADD:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/1/nb-1-big-,0,5][1197593494]^M 34423 [junit-timeout] ADD:[/tmp/cassandra/build/test/cassandra/data/TransactionLogsTest/mockcf39-1add51e07afd11edbd8f0b2a035c989d/2/nb-3-big-,0,5][1374830355]^M 34424 [junit-timeout] COMMIT:[,0,0][2613697770]^M 34425 [junit-timeout] ***This record should have been the last one in all replicas^M 34426 [junit-timeout] COMMIT:[,0,0][2613697770]^M 34427 [junit-timeout] ***This record should have been the last one in all replicas^M 34428 [junit-timeout] ^M {code} Not clear to me how the change in this patch could have contributed to that; might be worth multiplexing this unit test on trunk w/out the change to see if there's a pre-existing race / issue in there. The repeated leaks of the TransactionTidier from LogTransaction throughout the repeated runs also look a smidge suspicious: {code:java} 34435 [junit-timeout] ERROR [Reference-Reaper] 2022-12-13 15:45:01,466 Ref.java:237 - LEAK DETECTED: a reference (class org.apache.cassandra.db.lifecycle.LogTransaction$TransactionTidier@1776643200:[nb_txn_compaction_198e1590-7afd-11ed-bd8f-0b2a035c989d.log in /tmp/cassandra/build/test/cassandra/data/TransactionLog {code} > Handle sstable metadata stats file getting a new mtime after compaction has > finished > ------------------------------------------------------------------------------------ > > Key: CASSANDRA-18119 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18119 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction, Local/Startup and Shutdown > Reporter: Marcus Eriksson > Assignee: Marcus Eriksson > Priority: Normal > Fix For: 3.11.x, 4.0.x, 4.1.x, 4.x > > Time Spent: 40m > Remaining Estimate: 0h > > Due to a race between compaction finishing and compaction strategies getting > reloaded there is a chance that we try to add both the new sstable and the > old compacted sstable to the compaction strategy, and in the LCS case this > can cause the old sstable to get sent to L0 to avoid overlap. This changes > the mtime of the stats metadata file and if the node is shut down before the > sstable is actually deleted from disk, we fail starting with the following > exception: > {code} > .../mockcf1-392b3ff07c5a11ed8c662f5760cb10b3/nb_txn_compaction_3983c030-7c5a-11ed-8c66-2f5760cb10b3.log > REMOVE:[.../data/TransactionLogsTest/mockcf1-392b3ff07c5a11ed8c662f5760cb10b3/nb-0-big-,1671096247000,5][4003386800] > ***Unexpected files detected for sstable [nb-0-big-]: last update > time [Thu Dec 15 10:24:09 CET 2022] (1671096249000) should have been [Thu Dec > 15 10:24:07 CET 2022] (1671096247000) > ADD:[.../data/TransactionLogsTest/mockcf1-392b3ff07c5a11ed8c662f5760cb10b3/nb-2-big-,0,5][319189529] > {code} > A workaround for this (until we properly fix the way compaction strategies > get notified about sstable changes) is to ignore the timestamp of the STATS > component when cleaning up compaction leftovers on startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org