[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

Stefan Miklosovic (Jira) Mon, 06 May 2024 11:02:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843813#comment-17843813
 ]


Stefan Miklosovic commented on CASSANDRA-19429:
-----------------------------------------------

I think that size and capacity are trully two different concepts as already 
explained by Brandon and myself and I believe that having capacity set to 0 
practically disables a respective cache. 

CI for 4.1 looks good with the below patch. I identified couple more places 
where we could use reading the capacity from DatabaseDescriptor rather than 
calling getCapacity() on the cache to see if it is bigger than zero to proceed.

One just needs to be cautious to not forget to update DatabaseDescriptor when 
capacity is set to zero, e.g. from JMX etc if we are going to use DD as the 
source of truth whether a cache is enabled or not.

For other branches, this ticket, as I write this, specifies only 4.0 and 4.1 ax 
fix versions but that should be extended up to trunk.

This is all being said if there is an appetite to do this across all versions 
instead of just fixing the trace messages as suggested by Jon. The bellow build 
tells me that we are on something here and it would be just a matter of 
applying this elsewhere.

[CASSANDRA-19429-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.1]
{noformat}
java11_pre-commit_tests                         
java11_separate_tests                            
java8_pre-commit_tests                          
  ✓ j8_build                                         6m 53s
  ✓ j8_cqlsh_dtests_py3                              6m 53s
  ✓ j8_cqlsh_dtests_py311                            8m 26s
  ✓ j8_cqlsh_dtests_py311_vnode                      6m 53s
  ✓ j8_cqlsh_dtests_py38                              8m 3s
  ✓ j8_cqlsh_dtests_py38_vnode                       7m 40s
  ✓ j8_cqlsh_dtests_py3_vnode                        8m 16s
  ✓ j8_cqlshlib_cython_tests                        13m 32s
  ✓ j8_cqlshlib_tests                               11m 44s
  ✓ j8_dtests                                       33m 23s
  ✓ j8_dtests_vnode                                 38m 26s
  ✓ j8_jvm_dtests                                   19m 30s
  ✓ j8_jvm_dtests_vnode                             12m 23s
  ✓ j8_simulator_dtests                              5m 51s
  ✓ j11_jvm_dtests_vnode                            11m 50s
  ✓ j11_jvm_dtests                                  19m 47s
  ✓ j11_dtests_vnode                                 35m 6s
  ✓ j11_dtests                                      34m 46s
  ✓ j11_cqlshlib_tests                               6m 35s
  ✓ j11_cqlshlib_cython_tests                        9m 27s
  ✓ j11_cqlsh_dtests_py3_vnode                       5m 47s
  ✓ j11_cqlsh_dtests_py38_vnode                       6m 7s
  ✓ j11_cqlsh_dtests_py38                            5m 41s
  ✓ j11_cqlsh_dtests_py311_vnode                     5m 50s
  ✓ j11_cqlsh_dtests_py311                           5m 56s
  ✓ j11_cqlsh_dtests_py3                             5m 38s
  ✕ j8_unit_tests                                    9m 53s
      org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
      org.apache.cassandra.net.ConnectionTest testMessageDeliveryOnReconnect
  ✕ j8_utests_system_keyspace_directory              11m 5s
      org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
  ✕ j11_unit_tests                                   8m 12s
      org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist]
java8_separate_tests                             
{noformat}

[java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/7470a92f-1cb3-487d-9d0a-dd1f781a79e8]
[java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/2116e61d-1e6a-4589-85e7-1980acfbdb05]
[java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/30501a49-c2b0-4aaf-a504-f087f27e88f7]
[java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/232134f1-5956-4346-afa3-bd556b6c5f60]


> Remove lock contention generated by getCapacity function in SSTableReader
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19429
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Dipietro Salvatore
>            Assignee: Dipietro Salvatore
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x
>
>         Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, 
> asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19429) Remove lock contention generated by getCapacity function in SSTableReader

Reply via email to