[ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17214606#comment-17214606
 ] 

Josh McKenzie commented on CASSANDRA-14746:
-------------------------------------------

{quote}Our hope is that we can invest the time and money ahead of time instead 
of after the release for 4.0.
{quote}
There's that saying that "an ounce of prevention is worth a pound of cure" for 
a reason. :)

I'll dig around and see if I can surface any other large-scale performance 
testing. While I'd like us to move the needle on this one (as you and Vinay et 
al's work is doing), the ruthless pragmatist in me advocates for confidence in 
>= the performance of previous C* versions and getting the GA out for users and 
us iterating.

With as extensive as the changes in the MS are, the testing you all have 
enumerated here on top of all the other smaller cluster perf testing devs have 
done seems like a reasonable suite to have adequate confidence in the "no 
regression" stake in the ground. 

> Ensure Netty Internode Messaging Refactor is Solid
> --------------------------------------------------
>
>                 Key: CASSANDRA-14746
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14746
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Streaming and Messaging
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Normal
>              Labels: 4.0-QA
>             Fix For: 4.0-beta
>
>
> Before we release 4.0 let's ensure that the internode messaging refactor is 
> 100% solid. As internode messaging is naturally used in many code paths and 
> widely configurable we have a large number of cluster configurations and test 
> configurations that must be vetted.
> We plan to vary the following:
>  * Version of Cassandra 3.0.17 vs 4.0-alpha
>  * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes
>  * Client request rates varying between 1k QPS and 100k QPS of varying sizes 
> and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...)
>  * Internode compression
>  * Internode SSL (as well as openssl vs jdk)
>  * Internode Coalescing options
> We are looking to measure the following as appropriate:
>  * Latency distributions of reads and writes (lower is better)
>  * Scaling limit, aka maximum throughput before violating p99 latency 
> deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% 
> writes, 100% reads and 50-50 writes+reads (higher is better)
>  * Thread counts (lower is better)
>  * Context switches (lower is better)
>  * On-CPU time of tasks (higher periods without context switch is better)
>  * GC allocation rates / throughput for a fixed size heap (lower allocation 
> better)
>  * Streaming recovery time for a single node failure, i.e. can Cassandra 
> saturate the NIC
>  
> The goal is that 4.0 should have better latency, more throughput, fewer 
> threads, fewer context switches, less GC allocation, and faster recovery 
> time. I'm putting Jason Brown as the reviewer since he implemented most of 
> the internode refactor.
> Current collaborators driving this QA task: Dinesh Joshi, Jordan West, Joey 
> Lynch (Netflix), Vinay Chella (Netflix)
> Owning committer(s): Jason Brown



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to