Romain Anselin created CASSANDRA-19949:
------------------------------------------

             Summary: Count performance regression in Cassandra 4.x
                 Key: CASSANDRA-19949
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19949
             Project: Cassandra
          Issue Type: Bug
            Reporter: Romain Anselin
         Attachments: cass311count.txt, cass311debugcount.txt, 
cass311trace.txt, cass41count.txt, cass41debugcount.txt, cass41trace.txt, 
objcount-1.py

Cassandra 4 exhibit a severe drop of performance on count operations.

We created a reproduction workflow inserting a 100k rows of 10kb random string

After this data is inserted in a 3 nodes cluster at RF3 and queried at LQ, a 
count on said table takes
- circa 2s on 3.11
- consistently more than 10s on 4.0 and 4.1 (around 12 to 13s) - tested 4.0.10 
and 4.1.5

Observation of same program/query against each environment:

3.11
{code:java}
# COUNT #
61a5bcb0-75ca-11ef-9cff-55d571fe1347
Row count:100000
Count timing with fetch 5000: 0:00:01.846531
Average row size: 10000.0{code}

4.1
{code:java}
# COUNT #
55d79f60-75cb-11ef-a8be-399c3e257132
Row count:100000
Count timing with fetch 5000: 0:00:13.408626
Average row size: 10000.0{code}
The UUID shown in the above output is the trace ID on execution of the query 
which is then exported from each cluster via the command below and provide the 
cassXXtrace.txt file
{noformat}
cqlsh -e show session [trace_id] | tee cassXXtrace.txt{noformat}

Attached cass311trace.txt and cass41trace.txt which show the associated events 
from above query.

Note the issue is way more prevalent in a 3 nodes cluster (I also have tested 
on docker in one node and it's less visible).

Attaching objcount.py which contains 2 functions to insert and read the data. 
The insert is pretty slow due to generating random junk 10k objects but allows 
to reproduce. Just comment out the gateway_insert function for it to trigger 
data insert.
{code:java}
    # gateway_insert(session, ks, tbl)
    gateway_query(session, ks, tbl, fetch){code}
Requires argparse and cassandra driver
To use, run the following command. Consider uncommenting l.40 and 41 for 
ks/table creation and l. 155 for insert workload
{code:java}
python3 ./objcount.py -i <ip> -k <ks> -t <table>{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to