[ 
https://issues.apache.org/jira/browse/CASSANDRA-18443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715088#comment-17715088
 ] 

Jon Meredith commented on CASSANDRA-18443:
------------------------------------------

Starting commit

CI Results (pending):
||Branch||Source||Circle CI||Jenkins||
|cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18443-cassandra-4.0-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18443-cassandra-4.0-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2431/]|
|cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18443-cassandra-4.1-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18443-cassandra-4.1-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2432/]|
|trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18443-trunk-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18443-trunk-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2433/]|

> Deadlock updating sstable metadata if disk boundaries need reloading
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-18443
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18443
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/Memtable, Local/SSTable
>            Reporter: Jon Meredith
>            Assignee: Jon Meredith
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 5.0
>
>
> {{CompactionStrategyManager.handleNotification}} holds the read lock while 
> processing notifications. When handling metadata changed notifications, an 
> extra call is made to maybeReloadDiskBoundaries which tries to grab the write 
> lock and deadlocks the thread.
> Partial stacktrace
> {code}
>         at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method)
>         - parking to wait for  <0x00000005cc000078> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
>         at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.maybeReloadDiskBoundaries(CompactionStrategyManager.java:495)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.getCompactionStrategyFor(CompactionStrategyManager.java:343)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleMetadataChangedNotification(CompactionStrategyManager.java:796)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838)
>         at 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTableMetadataChanged(Tracker.java:482)
>         at 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838)
> {code}
> Deadlocking with the read lock held blocks the SlabpoolCleaner while 
> notifying ColumnFamilyStore so memtables are prevented from being flushed and 
> recycled, causing any thread applying a mutation to the database (at least 
> GossipStage and MutationStage) to be considered down by peers and/or back up 
> with pending requests.
> All the cases investigated were during single sstable upleveling by 
> {{org.apache.cassandra.db.compaction.SingleSSTableLCSTask}} added in 
> CASSANDRA-12526.
> Other less critical work was also affected, JMX calls to get estimated 
> remaining compaction tasks, the index summary manager redistributing 
> summaries, the StatusLogger trying to log dropped messages, and the 
> ValidationManager.
> Workaround is to reboot the affected host.
> The fix is to just remove the redundant disk boundary reload check on that 
> path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to