[ https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840220#comment-17840220 ]
Stefan Miklosovic edited comment on CASSANDRA-19572 at 4/23/24 8:27 PM: ------------------------------------------------------------------------ OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leak beyond the scope of afterTest and a new test is run etc ... I fee uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 was (Author: smiklosovic): OK so more digging ... I was trying to put into each afterTest "SSTableReader.resetTidying();" and it did help, below is each job with 5k repetitions. 4.0 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps 4.1 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps 5.0 j11 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps 5.0 j17 https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps However, I just noticed that there is already afterTest in CQLTester which ImportTest extends and I was _not_ calling it (super.afterTest()) in my afterTest. What CQLTester's afterTest does is this (1). It removes the tables and it deletes all SSTables on the disk, so I guess it also calls tidying, just by other means, but that whole operation runs in ScheduledExecutors.optionalTasks which is asynchronous. So, what happens, when we run a test method, then afterTest is invoked and removal is done asynchronously? Then JUnit does not wait until is is finished, right? I think that this work then might leaks beyond the scope of afterTest and a new test is run etc ... I fee uneasy about this and that is probably the real cause of the issues we see when it comes to these refs. What I am doing right now is that I am tidying it up before calling super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next step will be to run the logic in afterTest synchronously. (1) https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433 > Test failure: org.apache.cassandra.db.ImportTest flakiness > ---------------------------------------------------------- > > Key: CASSANDRA-19572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19572 > Project: Cassandra > Issue Type: Bug > Components: Tool/bulk load > Reporter: Brandon Williams > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > As discovered on CASSANDRA-19401, the tests in this class are flaky, at least > the following: > * testImportCorruptWithoutValidationWithCopying > * testImportInvalidateCache > * testImportCorruptWithCopying > * testImportCacheEnabledWithoutSrcDir > [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org