[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821460#comment-17821460
 ] 

Dipietro Salvatore commented on CASSANDRA-19429:
------------------------------------------------

from ~500k to ~150k ops. I am using all default settings - no optimizations on 
Cassandra or OS.

yes that instance has 96 cores CPU but the CPU utilization is around 25% when 
it reaches 150k ops. In addition, this is not something specific to Graviton 
CPU since it happens also on Intel.

I have tested the 50/50 R/W settings with :

 
{code:java}
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && bin/cqlsh -e 'drop 
keyspace if exists keyspace1;' && bin/nodetool clearsnapshot --all && 
tools/bin/cassandra-stress write n=10000000 cl=ONE -rate threads=384 -node 
127.0.0.1 -log file=cload.log -graph file=cload.html && bin/nodetool compact 
keyspace1   && sleep 30s && tools/bin/cassandra-stress mixed 
ratio\(write=50,read=50\) duration=10m cl=ONE -rate threads=100 -node localhost 
-log file=result.log -graph file=graph.html {code}

Results:

 

- 4.1.3 released:
{code:java}
Results:
Op rate                   :  142,571 op/s  [READ: 71,293 op/s, WRITE: 71,278 
op/s]
Partition rate            :  142,571 pk/s  [READ: 71,293 pk/s, WRITE: 71,278 
pk/s]
Row rate                  :  142,571 row/s [READ: 71,293 row/s, WRITE: 71,278 
row/s]
Latency mean              :    0.7 ms [READ: 1.3 ms, WRITE: 0.1 ms]
Latency median            :    0.2 ms [READ: 1.2 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    2.0 ms [READ: 2.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    2.6 ms [READ: 2.9 ms, WRITE: 0.2 ms]
Latency 99.9th percentile :    8.4 ms [READ: 9.6 ms, WRITE: 0.4 ms]
Latency max               :   51.0 ms [READ: 51.0 ms, WRITE: 47.7 ms]
Total partitions          : 85,661,309 [READ: 42,835,266, WRITE: 42,826,043]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 1,310
Total GC memory           : 2067.821 GiB
Total GC time             :    9.1 seconds
Avg GC time               :    7.0 ms
StdDev GC time            :    3.6 ms
Total operation time      : 00:10:00 {code}

- 4.1.3 with patch:
{code:java}
Results:
Op rate                   :  459,728 op/s  [READ: 229,910 op/s, WRITE: 229,818 
op/s]
Partition rate            :  459,728 pk/s  [READ: 229,910 pk/s, WRITE: 229,818 
pk/s]
Row rate                  :  459,728 row/s [READ: 229,910 row/s, WRITE: 229,818 
row/s]
Latency mean              :    0.2 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency median            :    0.2 ms [READ: 0.2 ms, WRITE: 0.1 ms]
Latency 95th percentile   :    0.3 ms [READ: 0.3 ms, WRITE: 0.2 ms]
Latency 99th percentile   :    0.4 ms [READ: 0.6 ms, WRITE: 0.3 ms]
Latency 99.9th percentile :    8.4 ms [READ: 8.9 ms, WRITE: 7.4 ms]
Latency max               : 1887.4 ms [READ: 1,887.4 ms, WRITE: 48.1 ms]
Total partitions          : 275,966,298 [READ: 138,010,917, WRITE: 137,955,381]
Total errors              :          0 [READ: 0, WRITE: 0]
Total GC count            : 4,438
Total GC memory           : 6971.464 GiB
Total GC time             :   33.7 seconds
Avg GC time               :    7.6 ms
StdDev GC time            :    3.8 ms
Total operation time      : 00:10:00 {code}

Increasing the percentage of writes in the workload, increase the difference in 
performance between the patch and without (3.2x)

 


Did you have the chance to test it with big instances your end?

> Remove lock contention generated by getCapacity function in SSTableReader
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19429
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Dipietro Salvatore
>            Assignee: Dipietro Salvatore
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x
>
>         Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to