[jira] [Commented] (CASSANDRA-19776) Spinning trying to capture readers

Carlos Rolo (Jira) Fri, 26 Jul 2024 02:10:29 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868900#comment-17868900
 ]


Carlos Rolo commented on CASSANDRA-19776:
-----------------------------------------

Could this be related: [CASSANDRA-13338] JMX: EstimatedPartitionCount / 
SnapshotSize are expensive - ASF JIRA (apache.org) ?

> Spinning trying to capture readers
> ----------------------------------
>
>                 Key: CASSANDRA-19776
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19776
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Cameron Zemek
>            Priority: Normal
>         Attachments: extract.log
>
>
> On a handful of clusters we are noticing Spin locks occurring. I traced back 
> all the calls to the EstimatedPartitionCount metric (eg. 
> org.apache.cassandra.metrics:type=Table,keyspace=testks,scope=testcf,name=EstimatedPartitionCount)
> Using the following patched function:
> {code:java}
>     public RefViewFragment selectAndReference(Function<View, 
> Iterable<SSTableReader>> filter)
>     {
>         long failingSince = -1L;
>         boolean first = true;
>         while (true)
>         {
>             ViewFragment view = select(filter);
>             Refs<SSTableReader> refs = Refs.tryRef(view.sstables);
>             if (refs != null)
>                 return new RefViewFragment(view.sstables, view.memtables, 
> refs);
>             if (failingSince <= 0)
>             {
>                 failingSince = System.nanoTime();
>             }
>             else if (System.nanoTime() - failingSince > 
> TimeUnit.MILLISECONDS.toNanos(100))
>             {
>                 List<SSTableReader> released = new ArrayList<>();
>                 for (SSTableReader reader : view.sstables)
>                     if (reader.selfRef().globalCount() == 0)
>                         released.add(reader);
>                 NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, 
> TimeUnit.SECONDS,
>                                  "Spinning trying to capture readers {}, 
> released: {}, ", view.sstables, released);
>                 if (first)
>                 {
>                     first = false;
>                     try {
>                         throw new RuntimeException("Spinning trying to 
> capture readers");
>                     } catch (Exception e) {
>                         logger.warn("Spin lock stacktrace", e);
>                     }
>                 }
>                 failingSince = System.nanoTime();
>             }
>         }
>     }
>  {code}
> Digging into this code I found it will fail if any of the sstables are in 
> released state (ie. reader.selfRef().globalCount() == 0).
> See the extract.log for an example of one of these spin lock occurrences. 
> Sometimes these spin locks last over 5 minutes. Across the worst cluster with 
> this issue, I ran a log processing script that everytime the 'Spinning trying 
> to capture readers' was different to previous one it would output if the 
> released tables were in Compacting state. Every single occurrence has it spin 
> locking with released listing a sstable that is compacting.
> In the extract.log example its spin locking saying that nb-320533-big-Data.db 
> has been released. But you can see prior to it spinning that sstable is 
> involved in a compaction. The compaction completes at 01:03:36 and the 
> spinning stops. nb-320533-big-Data.db is deleted at 01:03:49 along with the 
> other 9 sstables involved in the compaction.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19776) Spinning trying to capture readers

Reply via email to