[ https://issues.apache.org/jira/browse/CASSANDRA-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ariel Weisberg updated CASSANDRA-13422: --------------------------------------- Summary: CompactionStrategyManager should take write not read lock when handling remove notifications (was: CompactionStrategyManager should take write not read lock when handling add/remove notifications) > CompactionStrategyManager should take write not read lock when handling > remove notifications > -------------------------------------------------------------------------------------------- > > Key: CASSANDRA-13422 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13422 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Fix For: 4.0, 3.11.x > > > {{getNextBackgroundTask}} in various compaction strategies (definitely > {{LCS}}) rely on checking the result of {{DataTracker.getCompacting()}} to > avoid accessing data and metadata related to tables that have already head > their resources released. > There is a race where this check is unreliable and will claim a table that > has its resources already released is not compacting resulting in use after > free. > [{{LeveledCompactionStrategy.findDroppableSSTable}}|https://github.com/apache/cassandra/blob/c794d2bed7ca1d10e13c4da08a3d45f5c755c1d8/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L504] > for instance has this three part logical && condition where the first check > is against the compacting set before calling {{worthDroppingTombstones}} > which fails if the table has been released. > The order of events is basically that CompactionStrategyManager acquires the > read lock in getNextBackgroundTask(), then proceeds eventually to > findDroppableSSTable and acquires a set of SSTables from the manifest. While > the manifest is thread safe it's not accessed atomically WRT to other > operations. Once it has acquire the set of tables it acquires the (not > atomically) the set of compacting SSTables and iterates checking the former > against the latter. > Meanwhile other compaction threads are marking tables obsolete or compacted > and releasing their references. Doing this removes them from {{DataTracker}} > and publishes a notification to the strategies, but this notification only > requires the read lock. After the compaction thread has published the > notifications it eventually marks the table as not compacting in > {{DataTracker}} or removes it entirely. > The race is then that the compaction thread generating a new background task > acquires the sstables from the manifest on the stack. Any table in that set > that was compacting at that time must remain compacting so that it can be > skipped. Another compaction thread finishes a compaction and is able to > remove the table from the manifest and then remove it from the compacting > set. The thread generating the background task then acquires the list of > compacting tables which doesn't include the table it is supposed to skip. > The simple fix appears to be to require threads to acquire the write lock in > order to publish notifications of tables being added/removed from compaction > strategies. While holding the write lock it won't be possible for someone to > see a view of tables in the manifest where tables that are compacting aren't > compacting in the view -- This message was sent by Atlassian JIRA (v6.3.15#6346)