[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827395#comment-16827395 ]
Joseph Lynch commented on CASSANDRA-15066: ------------------------------------------ Note, this is not a comparative analysis nor do we have root causes for all findings. [~vinaykumarcse] and my goal today was to kick the tires of this patch and see if there were any serious issues. We threw [{{9b0b814add}}|https://github.com/apache/cassandra/commit/9b0b814add8da5a66c12a87a7bfebb015667f293] on a smallish cluster and punished it with read and write load. *Test setup* * Two datacenters, approximately 70ms apart * 6 {{{i3.2xlarge}} nodes per datacenter (4 physical cores with 8 threads, 60GiB of memory, 1.8 TB NVMe dedicated drive, 2 Gbps network) * 3 node NDBench cluster generating {{LOCAL_ONE }}random writes and full partition reads of ~4kb partitions consisting of 2 rows of 10 columns each. Total dataset per node was about ~180GiB and reads were uniformly distributed across the partition space. The table was mostly defaults (RF=3) except it used Leveled Compaction Strategy and no compression (since the data is random). *First test, bootstrap (aka punish with writes)* **In this test we used NDBench's backfill feature to punish the cluster with writes. * Backfilling the dataset achieved sustained write throughput of 20k coordinator level WPS easily, with average latencies staying below 1ms * The limiting factor appeared to be compaction throughput * Flamegraphs are attached There were no observed hints or dropped messages and datasizes in both datacenters looked reasonably consistent. I think this went very well. *Second test, establish baseline* **Next, we sent a reasonably modest 1200 coordinator RPS and 600 WPS, which is very light load, and compared this patch to our production 30x production branch. * Writes are ~20%faster, like we saw previously in netty trunk vs 30x * Reads are *~500%* slower, this is new since our last tests and from the flamegraph [~benedict] suspects and I agree that it was likely related to some of the TR cleanup * Checked the virtual table metrics and they seem reasonable, also spot checked some of the new jmx per channel metrics Summary: The read latency is concerning, but I think Benedict may already have the fix. *Third test, punish with reads* Due to the poor baseline read performance, we attempted to push the reads as far as they would go while acquiring a flamegraph for debugging where we are spending time. * We were able to push the cluster to 60,000 coordinator RPS before we started seeing CPU queuing. * Flamegraphs are attached tpstats showed relatively little queueing or QOS issues, and local read latencies remained fast, so we believe that there is a different issue at play in the read path. Flamegraphs are attached for debugging. *Fourth test, punish with reads and writes* **We're currently attempting a mixed mode test where we do many reads and writes and see how they interact. Results will be posted shortly. I think we'll need to bump our branch to pickup the latest changes. *Summary* So far this patch looks to be doing a great job, we have some issues to figure out with the reads and many more tests to run, but it didn't explode so that is good heh. > Improvements to Internode Messaging > ----------------------------------- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode > Reporter: Benedict > Assignee: Benedict > Priority: High > Fix For: 4.0 > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org