[ https://issues.apache.org/jira/browse/CASSANDRA-19949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884438#comment-17884438 ]
Stefan Miklosovic commented on CASSANDRA-19949: ----------------------------------------------- Could you test 4.0.0? (please attach the results too) > Count performance regression in Cassandra 4.x > --------------------------------------------- > > Key: CASSANDRA-19949 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19949 > Project: Cassandra > Issue Type: Bug > Reporter: Romain Anselin > Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: cass311count.txt, cass311debugcount.txt, > cass311trace.txt, cass41count.txt, cass41debugcount.txt, cass41trace.txt, > objcount-1.py > > > Cassandra 4 exhibit a severe drop of performance on count operations. > We created a reproduction workflow inserting a 100k rows of 10kb random string > After this data is inserted in a 3 nodes cluster at RF3 and queried at LQ, a > count on said table takes > - circa 2s on 3.11 > - consistently more than 10s on 4.0 and 4.1 (around 12 to 13s) - tested > 4.0.10 and 4.1.5 > Observation of same program/query against each environment: > 3.11 > {code:java} > # COUNT # > 61a5bcb0-75ca-11ef-9cff-55d571fe1347 > Row count:100000 > Count timing with fetch 5000: 0:00:01.846531 > Average row size: 10000.0{code} > 4.1 > {code:java} > # COUNT # > 55d79f60-75cb-11ef-a8be-399c3e257132 > Row count:100000 > Count timing with fetch 5000: 0:00:13.408626 > Average row size: 10000.0{code} > The UUID shown in the above output is the trace ID on execution of the query > which is then exported from each cluster via the command below and provide > the cassXXtrace.txt file > {noformat} > cqlsh -e show session [trace_id] | tee cassXXtrace.txt{noformat} > Attached cass311trace.txt and cass41trace.txt which show the associated > events from above query. > Note the issue is way more prevalent in a 3 nodes cluster (I also have tested > on docker in one node and it's less visible). > Attaching objcount.py which contains 2 functions to insert and read the data. > The insert is pretty slow due to generating random junk 10k objects but > allows to reproduce. Just comment out the gateway_insert function for it to > trigger data insert. > {code:java} > # gateway_insert(session, ks, tbl) > gateway_query(session, ks, tbl, fetch){code} > Requires argparse and cassandra driver > To use, run the following command. Consider uncommenting l.40 and 41 for > ks/table creation and l. 155 for insert workload > {code:java} > python3 ./objcount.py -i <ip> -k <ks> -t <table>{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org