[ https://issues.apache.org/jira/browse/CASSANDRA-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17291874#comment-17291874 ]
Sergey Kandyla commented on CASSANDRA-15977: -------------------------------------------- [~adelapena] as for >> What's the average number of items of {{ids}}? Is it possible that the >> replicas are very out-of-sync? in my tests I query one id. In life app it can be multiple, but most often query is still for one id or few. I've collected some more info about this problem by upgrading one of our live regional clusters (from 3.11.8 to 3.11.10) All benchmarks where made by Vegeta loadtesting tool, which generate constant RPS rate (actually like ab but a bit more accurate). The idea was to not overload the cluster, but generate some constant request rate to see a latency in optimal conditions. !Screenshot 2021-02-26 at 20.59.08.png|width=831,height=150! There are 4 tests in this table. One via life app, and other 3 via small golang app which made a query to the database to isolate any distortions from the environment (i.e. k8s and so on). Cassandra cluster metrics during the benchmark: !Screenshot 2021-02-22 at 16.10.53.png|width=723,height=169! Where 12:03-12:24 loadtest for cassandra 3.11.8, 12:35-12:40 upgrading cassandra to 3.11.10 12:43-13:05 the same loadtest for cassandra 3.11.10 CPU load increase for 5-20%. *Latency:* !Screenshot 2021-02-22 at 16.07.45.png|width=737,height=181! Avg latency P50, cassandra 3.11.8 (metrics taken from jolokia2 plugin) !Screenshot 2021-02-22 at 16.07.29.png|width=741,height=133! Avg latency P50, cassandra 3.11.10 - actually doubled. !Screenshot 2021-02-22 at 16.08.01.png|width=742,height=140! Avg latency P99, cassandra 3.11.8 !Screenshot 2021-02-22 at 16.08.17.png|width=744,height=125! Avg latency P99, cassandra 3.11.10 - latency doubled again. Histogram Buckets: {code:java} Bucket # % Histogram [0s, 500µs] 0 0.00% [500µs, 1ms] 0 0.00% [1ms, 1.5ms] 0 0.00% [1.5ms, 2ms] 19208 12.73% ######### [2ms, 3ms] 100785 66.79% ################################################## [3ms, 4ms] 12385 8.21% ###### [4ms, 5ms] 720 0.48% [5ms, 6ms] 634 0.42% [6ms, 7ms] 632 0.42% [7ms, 8ms] 546 0.36% [8ms, 9ms] 550 0.36% [9ms, 10ms] 674 0.45% {code} Latency distribution (for one of tests) by histogram buckets, cassandra 3.11.8 {code:java} Bucket # % Histogram [0s, 500µs] 0 0.00% [500µs, 1ms] 0 0.00% [1ms, 1.5ms] 0 0.00% [1.5ms, 2ms] 0 0.00% [2ms, 3ms] 30852 20.45% ############### [3ms, 4ms] 20332 13.47% ########## [4ms, 5ms] 69602 46.12% ################################## [5ms, 6ms] 2347 1.56% # [6ms, 7ms] 613 0.41% [7ms, 8ms] 354 0.23% [8ms, 9ms] 328 0.22% [9ms, 10ms] 341 0.23% [10ms, 12ms] 702 0.47% {code} Latency distribution (for one of tests) by histogram buckets, cassandra 3.11.10 The following metrics are not clear for me, but may be they make sense for you. !Screenshot 2021-02-22 at 16.14.51.png|width=844,height=159! loadtest cassandra 3.11.8 (left), 3.11.10(right) !Screenshot 2021-02-22 at 16.18.38.png|width=831,height=522! loadtest cassandra 3.11.8 (left), 3.11.10(right). !Screenshot 2021-02-22 at 16.15.12.png|width=829,height=373! loadtest cassandra 3.11.8 (left), 3.11.10(right). Increase in ReadStage for both Active and Pending tasks for cassandra 3.11.10. Also increase in Native Transport Requests (Active tasks). Finally latency difference in 2 day view for real traffic: !Screenshot 2021-02-23 at 08.40.53.png|width=835,height=167! Latency P95 read before and after upgrade to 3.11.10 Don't mind I've benchmarked cassandra 3.11.8 vs 3.11.10. Since we have started experience perfomance issues after upgrading to 3.11.9, and 3.11.10 is actually the same. > 4.0 Quality: Read Repair Test Audit > ----------------------------------- > > Key: CASSANDRA-15977 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15977 > Project: Cassandra > Issue Type: Task > Components: Test/dtest/java, Test/unit > Reporter: Andres de la Peña > Assignee: Andres de la Peña > Priority: Normal > Fix For: 3.11.9, 4.0, 4.0-beta3 > > Attachments: Screenshot 2021-02-05 at 18.01.10.png, Screenshot > 2021-02-22 at 16.07.29.png, Screenshot 2021-02-22 at 16.07.45.png, Screenshot > 2021-02-22 at 16.08.01.png, Screenshot 2021-02-22 at 16.08.17.png, Screenshot > 2021-02-22 at 16.10.53.png, Screenshot 2021-02-22 at 16.14.51.png, Screenshot > 2021-02-22 at 16.15.12.png, Screenshot 2021-02-22 at 16.18.38.png, Screenshot > 2021-02-23 at 08.40.53.png, Screenshot 2021-02-26 at 20.59.08.png > > Time Spent: 13h 50m > Remaining Estimate: 0h > > This is a subtask of CASSANDRA-15579 focusing on read repair. > [This > document|https://docs.google.com/document/d/1-gldHcdLSMRbDhhI8ahs_tPeAZsjurjXr38xABVjWHE/edit?usp=sharing] > lists and describes the existing functional tests for read repair, so we can > have a broad view of what is currently covered. We can comment on this > document and add ideas for new cases/tests, so it can gradually evolve to a > more or less detailed test plan. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org