[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

Jon Haddad (Jira) Fri, 08 Mar 2024 15:39:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824885#comment-17824885
 ]


Jon Haddad commented on CASSANDRA-19429:
----------------------------------------

When I try to spin up those instance types in us-west-2 I get an error that 
they're invalid, so I'm running a test with c5.18xlarge.

I'm working with a single node, with compaction disabled, and I reduced my 
memtable space to 16MB in order to constantly flush.  I wrote 10m rows and have 
1928 SStables.  These boxes have 72 CPU.  I'm using G1GC with a 24GB heap.  
I've tested concurrent_reads at 64 and 128 since there's enough cores on here 
to handle and we don't need to bottleneck on reads.

So, right off the bat, I'm not able to duplicate the original observation about 
CPU not going over 50% utilization.  4.1 has reached 90+% CPU utilization:
{noformat}
23:29:57     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  
%guest  %gnice   %idle
23:29:58     all   90.11    0.00    1.07    0.00    0.00    0.24    0.01    
0.00    0.00    8.57
23:29:59     all   90.12    0.00    0.84    0.00    0.00    0.27    0.00    
0.00    0.00    8.77
23:30:00     all   89.82    0.00    0.83    0.03    0.00    0.38    0.01    
0.00    0.00    8.93
23:30:01     all   90.13    0.03    1.08    0.00    0.00    0.30    0.00    
0.00    0.00    8.47
23:30:02     all   89.95    0.00    0.89    0.00    0.00    0.34    0.01    
0.00    0.00    8.82
23:30:03     all   89.86    0.00    1.08    0.00    0.00    0.24    0.00    
0.00    0.00    8.83
23:30:04     all   87.90    0.00    0.97    0.00    0.00    0.24    0.01    
0.00    0.00   10.88 {noformat}
Using a variety of easy-cass-stress KeyValue workloads with different settings 
for --rate, I'm unable to see any meaningful difference, performance-wise.
{noformat}
easy-cass-stress run KeyValue -d 20m --rate 20k -p 10m -t 16 -r 1{noformat}
For each workload I've run, I've seen virtually identical results.  Both are 
pushing C* to use 90% CPU and achieve roughly 25K reads / second.

> Remove lock contention generated by getCapacity function in SSTableReader
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19429
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Dipietro Salvatore
>            Assignee: Dipietro Salvatore
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x
>
>         Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

Reply via email to