[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843813#comment-17843813 ]
Stefan Miklosovic commented on CASSANDRA-19429: ----------------------------------------------- I think that size and capacity are trully two different concepts as already explained by Brandon and myself and I believe that having capacity set to 0 practically disables a respective cache. CI for 4.1 looks good with the below patch. I identified couple more places where we could use reading the capacity from DatabaseDescriptor rather than calling getCapacity() on the cache to see if it is bigger than zero to proceed. One just needs to be cautious to not forget to update DatabaseDescriptor when capacity is set to zero, e.g. from JMX etc if we are going to use DD as the source of truth whether a cache is enabled or not. For other branches, this ticket, as I write this, specifies only 4.0 and 4.1 ax fix versions but that should be extended up to trunk. This is all being said if there is an appetite to do this across all versions instead of just fixing the trace messages as suggested by Jon. The bellow build tells me that we are on something here and it would be just a matter of applying this elsewhere. [CASSANDRA-19429-4.1|https://github.com/instaclustr/cassandra/tree/CASSANDRA-19429-4.1] {noformat} java11_pre-commit_tests java11_separate_tests java8_pre-commit_tests ✓ j8_build 6m 53s ✓ j8_cqlsh_dtests_py3 6m 53s ✓ j8_cqlsh_dtests_py311 8m 26s ✓ j8_cqlsh_dtests_py311_vnode 6m 53s ✓ j8_cqlsh_dtests_py38 8m 3s ✓ j8_cqlsh_dtests_py38_vnode 7m 40s ✓ j8_cqlsh_dtests_py3_vnode 8m 16s ✓ j8_cqlshlib_cython_tests 13m 32s ✓ j8_cqlshlib_tests 11m 44s ✓ j8_dtests 33m 23s ✓ j8_dtests_vnode 38m 26s ✓ j8_jvm_dtests 19m 30s ✓ j8_jvm_dtests_vnode 12m 23s ✓ j8_simulator_dtests 5m 51s ✓ j11_jvm_dtests_vnode 11m 50s ✓ j11_jvm_dtests 19m 47s ✓ j11_dtests_vnode 35m 6s ✓ j11_dtests 34m 46s ✓ j11_cqlshlib_tests 6m 35s ✓ j11_cqlshlib_cython_tests 9m 27s ✓ j11_cqlsh_dtests_py3_vnode 5m 47s ✓ j11_cqlsh_dtests_py38_vnode 6m 7s ✓ j11_cqlsh_dtests_py38 5m 41s ✓ j11_cqlsh_dtests_py311_vnode 5m 50s ✓ j11_cqlsh_dtests_py311 5m 56s ✓ j11_cqlsh_dtests_py3 5m 38s ✕ j8_unit_tests 9m 53s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] org.apache.cassandra.net.ConnectionTest testMessageDeliveryOnReconnect ✕ j8_utests_system_keyspace_directory 11m 5s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] ✕ j11_unit_tests 8m 12s org.apache.cassandra.cql3.MemtableSizeTest testSize[skiplist] java8_separate_tests {noformat} [java11_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/7470a92f-1cb3-487d-9d0a-dd1f781a79e8] [java11_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/2116e61d-1e6a-4589-85e7-1980acfbdb05] [java8_pre-commit_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/30501a49-c2b0-4aaf-a504-f087f27e88f7] [java8_separate_tests|https://app.circleci.com/pipelines/github/instaclustr/cassandra/4285/workflows/232134f1-5956-4346-afa3-bd556b6c5f60] > Remove lock contention generated by getCapacity function in SSTableReader > ------------------------------------------------------------------------- > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable > Reporter: Dipietro Salvatore > Assignee: Dipietro Salvatore > Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot > 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png, > asprof_cass4.1.3__lock_20240216052912lock.html, > image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org