[ 
https://issues.apache.org/jira/browse/CASSANDRA-17519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525871#comment-17525871
 ] 

Benedict Elliott Smith commented on CASSANDRA-17519:
----------------------------------------------------

I suspect two things:

1) When originally written, this code depended on the assumption that there was 
mutual exclusion when creating one of these tidy objects, and that assumption 
was later broken (or perhaps was always false);

2) A variant of this race condition was encountered by the simulator when 
validating Paxos, and I fixed it without paying much attention to get things 
moving (perhaps without even intending to properly fix it at the time, as there 
was too much to do), and then forgot about it.

I'll try to find time to perform a proper analysis of your report and the wider 
problems.

> races/leaks in SSTableReader::GlobalTidy
> ----------------------------------------
>
>                 Key: CASSANDRA-17519
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17519
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Jakub Zytka
>            Assignee: Jakub Zytka
>            Priority: Normal
>         Attachments: CASSANDRA-17519-4.0.txt, CASSANDRA-17519-4.1-fix.txt, 
> CASSANDRA-17519-4.1-test-exposing-the-problem.txt
>
>
> In Cassandra 4.0/3.11 there are at least two races in 
> SSTableReader::GlobalTidy
> One is a get/get race, explicitly handled as an assertion in:
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2199-L2204]
> and it looks like "ok, it's a problem, but let's just not fix it"
> The other one is get/tidy race between 
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2194-L2196]
> and
> [http://github.com/apache/cassandra/blob/c22accc46458d0a583afcf6a980f731cdcc94465/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2174-L2175]
>  
> The second one can be easily hit by adding a small delay at the beginning of 
> `tidy()` method (say, 20ms) and running `LongStreamingTest` (and actually 
> such failure is what prompted the investigation of GlobalTidy correctness)
> There was an attempt on `trunk` to fix these two races.
> The details are not clear to me, and it all looks quite weird. I might be 
> mistaken, but as far as I can see the relevant changes were introduced in:
> [https://github.com/apache/cassandra/commit/31bea0b0d41e4e81095f0d088094f03db14af490]
> that is piggybacked on a huge change in CASSANDRA-17008, without a separate 
> ticket or any sort of qa.
> As far as I can see this attempt changes the first race into a leak, and the 
> second race to another race, this time allowing to have multiple GlobalTidy 
> objects for the same sstable (and, as a result, a premature running of 
> obsoletion code).
> I'll follow up with PRs for relevant branches etc etc



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to