[ 
https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291874#comment-17291874
 ] 

Sergey Kandyla commented on CASSANDRA-15977:
--------------------------------------------

[~adelapena]  as for
>> What's the average number of items of {{ids}}? Is it possible that the 
>> replicas are very out-of-sync?
in my tests I query one id. In life app it can be multiple, but most often 
query is still for one id or few.

I've collected some more info about this problem by upgrading one of our live 
regional clusters (from 3.11.8 to 3.11.10)
All benchmarks where made by Vegeta loadtesting tool, which generate constant 
RPS rate (actually like ab but a bit more accurate).
The idea was to not overload the cluster, but generate some constant request 
rate to see a latency in optimal conditions.


!Screenshot 2021-02-26 at 20.59.08.png|width=831,height=150!
There are 4 tests in this table. One via life app, and other 3 via small golang 
app which made a query to the database to isolate any distortions from the 
environment (i.e. k8s and so on).

Cassandra cluster metrics during the benchmark: 
!Screenshot 2021-02-22 at 16.10.53.png|width=723,height=169!
Where 12:03-12:24 loadtest for cassandra 3.11.8,
12:35-12:40 upgrading cassandra to 3.11.10
12:43-13:05 the same loadtest for cassandra 3.11.10

CPU load increase for 5-20%.

*Latency:*
!Screenshot 2021-02-22 at 16.07.45.png|width=737,height=181!
Avg latency P50, cassandra 3.11.8  (metrics taken from jolokia2 plugin)

!Screenshot 2021-02-22 at 16.07.29.png|width=741,height=133!
Avg latency P50, cassandra 3.11.10 - actually doubled.

!Screenshot 2021-02-22 at 16.08.01.png|width=742,height=140!
Avg latency P99, cassandra 3.11.8

!Screenshot 2021-02-22 at 16.08.17.png|width=744,height=125!
Avg latency P99, cassandra 3.11.10 - latency doubled again.

Histogram Buckets:
{code:java}
Bucket           #       %       Histogram
[0s,     500µs]  0       0.00%
[500µs,  1ms]    0       0.00%
[1ms,    1.5ms]  0       0.00%
[1.5ms,  2ms]    19208   12.73%  #########
[2ms,    3ms]    100785  66.79%  
##################################################
[3ms,    4ms]    12385   8.21%   ######
[4ms,    5ms]    720     0.48%
[5ms,    6ms]    634     0.42%
[6ms,    7ms]    632     0.42%
[7ms,    8ms]    546     0.36%
[8ms,    9ms]    550     0.36%
[9ms,    10ms]   674     0.45%
{code}
 

Latency distribution (for one of tests) by histogram buckets, cassandra 3.11.8


{code:java}
Bucket           #      %       Histogram
[0s,     500µs]  0      0.00%
[500µs,  1ms]    0      0.00%
[1ms,    1.5ms]  0      0.00%
[1.5ms,  2ms]    0      0.00%
[2ms,    3ms]    30852  20.45%  ###############
[3ms,    4ms]    20332  13.47%  ##########
[4ms,    5ms]    69602  46.12%  ##################################
[5ms,    6ms]    2347   1.56%   #
[6ms,    7ms]    613    0.41%
[7ms,    8ms]    354    0.23%
[8ms,    9ms]    328    0.22%
[9ms,    10ms]   341    0.23%
[10ms,   12ms]   702    0.47%
{code}
Latency distribution (for one of tests) by histogram buckets, cassandra 3.11.10


The following metrics are not clear for me, but may be they make sense for you.

!Screenshot 2021-02-22 at 16.14.51.png|width=844,height=159!
loadtest cassandra 3.11.8 (left), 3.11.10(right)

!Screenshot 2021-02-22 at 16.18.38.png|width=831,height=522!
loadtest cassandra 3.11.8 (left), 3.11.10(right). 

!Screenshot 2021-02-22 at 16.15.12.png|width=829,height=373!
loadtest cassandra 3.11.8 (left), 3.11.10(right).  Increase in ReadStage for 
both Active and Pending tasks for cassandra 3.11.10. Also increase in Native 
Transport Requests (Active tasks).

Finally latency difference in 2 day view for real traffic:
!Screenshot 2021-02-23 at 08.40.53.png|width=835,height=167!
Latency P95 read before and after upgrade to 3.11.10

Don't mind I've benchmarked cassandra 3.11.8 vs 3.11.10. Since we have started 
experience perfomance issues after upgrading to 3.11.9, and 3.11.10 is actually 
the same.

> 4.0 Quality: Read Repair Test Audit
> -----------------------------------
>
>                 Key: CASSANDRA-15977
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15977
>             Project: Cassandra
>          Issue Type: Task
>          Components: Test/dtest/java, Test/unit
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 3.11.9, 4.0, 4.0-beta3
>
>         Attachments: Screenshot 2021-02-05 at 18.01.10.png, Screenshot 
> 2021-02-22 at 16.07.29.png, Screenshot 2021-02-22 at 16.07.45.png, Screenshot 
> 2021-02-22 at 16.08.01.png, Screenshot 2021-02-22 at 16.08.17.png, Screenshot 
> 2021-02-22 at 16.10.53.png, Screenshot 2021-02-22 at 16.14.51.png, Screenshot 
> 2021-02-22 at 16.15.12.png, Screenshot 2021-02-22 at 16.18.38.png, Screenshot 
> 2021-02-23 at 08.40.53.png, Screenshot 2021-02-26 at 20.59.08.png
>
>          Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> This is a subtask of CASSANDRA-15579 focusing on read repair.
> [This 
> document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing]
>  lists and describes the existing functional tests for read repair, so we can 
> have a broad view of what is currently covered. We can comment on this 
> document and add ideas for new cases/tests, so it can gradually evolve to a 
> more or less detailed test plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to