[ 
https://issues.apache.org/jira/browse/CASSANDRA-19572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840220#comment-17840220
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19572 at 4/23/24 8:27 PM:
------------------------------------------------------------------------

OK so more digging ... I was trying to put into each afterTest 
"SSTableReader.resetTidying();" and it did help, below is each job with 5k 
repetitions.

4.0 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps
4.1 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps
5.0 j11 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps
5.0 j17 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps

However, I just noticed that there is already afterTest in CQLTester which 
ImportTest extends and I was _not_ calling it (super.afterTest()) in my 
afterTest. What CQLTester's afterTest does is this (1). It removes the tables 
and it deletes all SSTables on the disk, so I guess it also calls tidying, just 
by other means, but that whole operation runs in 
ScheduledExecutors.optionalTasks which is asynchronous.

So, what happens, when we run a test method, then afterTest is invoked and 
removal is done asynchronously? Then JUnit does not wait until is is finished, 
right? I think that this work then might leak beyond the scope of afterTest and 
a new test is run etc ... I feel uneasy about this and that is probably the 
real cause of the issues we see when it comes to these refs. 

What I am doing right now is that I am tidying it up before calling 
super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next 
step will be to run the logic in afterTest synchronously.

(1) 
https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433


was (Author: smiklosovic):
OK so more digging ... I was trying to put into each afterTest 
"SSTableReader.resetTidying();" and it did help, below is each job with 5k 
repetitions.

4.0 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4223/workflows/a82d0483-a0df-44ed-8127-088b303c78ba/jobs/225432/steps
4.1 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4224/workflows/eae7a5e2-89dd-46cd-aaca-1e4250d0fa8b/jobs/225531/steps
5.0 j11 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225728/steps
5.0 j17 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/4226/workflows/9805ec75-fd02-4c5a-8996-fa5bce71e8c2/jobs/225727/steps

However, I just noticed that there is already afterTest in CQLTester which 
ImportTest extends and I was _not_ calling it (super.afterTest()) in my 
afterTest. What CQLTester's afterTest does is this (1). It removes the tables 
and it deletes all SSTables on the disk, so I guess it also calls tidying, just 
by other means, but that whole operation runs in 
ScheduledExecutors.optionalTasks which is asynchronous.

So, what happens, when we run a test method, then afterTest is invoked and 
removal is done asynchronously? Then JUnit does not wait until is is finished, 
right? I think that this work then might leak beyond the scope of afterTest and 
a new test is run etc ... I fee uneasy about this and that is probably the real 
cause of the issues we see when it comes to these refs. 

What I am doing right now is that I am tidying it up before calling 
super.afterTest and I run multiplex on 4.0 again. If it fails, I guess the next 
step will be to run the logic in afterTest synchronously.

(1) 
https://github.com/apache/cassandra/blob/cassandra-4.1/test/unit/org/apache/cassandra/cql3/CQLTester.java#L417-L433

> Test failure: org.apache.cassandra.db.ImportTest flakiness
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-19572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19572
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tool/bulk load
>            Reporter: Brandon Williams
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x
>
>
> As discovered on CASSANDRA-19401, the tests in this class are flaky, at least 
> the following:
>  * testImportCorruptWithoutValidationWithCopying
>  * testImportInvalidateCache
>  * testImportCorruptWithCopying
>  * testImportCacheEnabledWithoutSrcDir
> [https://app.circleci.com/pipelines/github/instaclustr/cassandra/4199/workflows/a70b41d8-f848-4114-9349-9a01ac082281/jobs/223621/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to