[ https://issues.apache.org/jira/browse/CASSANDRA-18443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715088#comment-17715088 ]
Jon Meredith commented on CASSANDRA-18443: ------------------------------------------ Starting commit CI Results (pending): ||Branch||Source||Circle CI||Jenkins|| |cassandra-4.0|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18443-cassandra-4.0-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18443-cassandra-4.0-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2431/]| |cassandra-4.1|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18443-cassandra-4.1-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18443-cassandra-4.1-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2432/]| |trunk|[branch|https://github.com/jonmeredith/cassandra/tree/commit_remote_branch/CASSANDRA-18443-trunk-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://app.circleci.com/pipelines/github/jonmeredith/cassandra?branch=commit_remote_branch%2FCASSANDRA-18443-trunk-03ECF093-C946-410B-85D2-BF4C5F6ACFB4]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/2433/]| > Deadlock updating sstable metadata if disk boundaries need reloading > -------------------------------------------------------------------- > > Key: CASSANDRA-18443 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18443 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Memtable, Local/SSTable > Reporter: Jon Meredith > Assignee: Jon Meredith > Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0 > > > {{CompactionStrategyManager.handleNotification}} holds the read lock while > processing notifications. When handling metadata changed notifications, an > extra call is made to maybeReloadDiskBoundaries which tries to grab the write > lock and deadlocks the thread. > Partial stacktrace > {code} > at jdk.internal.misc.Unsafe.park(java.base@11.0.16/Native Method) > - parking to wait for <0x00000005cc000078> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued > at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire > at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.maybeReloadDiskBoundaries(CompactionStrategyManager.java:495) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.getCompactionStrategyFor(CompactionStrategyManager.java:343) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleMetadataChangedNotification(CompactionStrategyManager.java:796) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838) > at > org.apache.cassandra.db.lifecycle.Tracker.notifySSTableMetadataChanged(Tracker.java:482) > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(CompactionStrategyManager.java:838) > {code} > Deadlocking with the read lock held blocks the SlabpoolCleaner while > notifying ColumnFamilyStore so memtables are prevented from being flushed and > recycled, causing any thread applying a mutation to the database (at least > GossipStage and MutationStage) to be considered down by peers and/or back up > with pending requests. > All the cases investigated were during single sstable upleveling by > {{org.apache.cassandra.db.compaction.SingleSSTableLCSTask}} added in > CASSANDRA-12526. > Other less critical work was also affected, JMX calls to get estimated > remaining compaction tasks, the index summary manager redistributing > summaries, the StatusLogger trying to log dropped messages, and the > ValidationManager. > Workaround is to reboot the affected host. > The fix is to just remove the redundant disk boundary reload check on that > path. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org