[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861303#comment-16861303 ]
Alex Petrov commented on CASSANDRA-15066: ----------------------------------------- Thank you for the patch. Patch looks good and latest additions are great. I'm happy to +1 it, I just have several comments, most of them are minor: * looks like monotonic clock implementation might race. each operation is synchronized but their combination isn’t: {code} public void refreshNow() { pauseNowSampling(); resumeNowSampling(); } {code} * in {{waitUntilFlushed}} i’d swap {{int signalWhenExcessBytesWritten, int wakeUpWhenExcessBytesWritten}} since other methods (like {{parkUntilFlushed(long wakeUpWhenFlushed, long signalWhenFlushed)}}) use wake up first and then signal * in {{releaseSpace}} i’d leave a comment why thread is unparked but {{waiting}} is not set to {{null}} ({{parkUntilFlushed}} releases it) * in {{resumeEpochSampling}} and {{resumeNowSampling}} we can use {{scheduleWithFixedDelay(task, 0, … )}} instead of running a task upfront * Not sure if it’s necessary, but we can save the results of {{chunk0.free()}} and {{chunk1.free()}}. But since array was inlined, I assume performance was important. * second switch in {{BufferPool}} microqueue {{removeIf}} inverts the logic in case 1. I’d add a small comment that the only two cases we need to shift are {{null, chunk, null}} and {{null, null, chunk}}. * Should error in approximate time be an absolute value? Especially since we seem to compare two error values later. However, it seems it has to be the case anyways * In {{OutboundConnection}}, there seems to be a race condition between between {{promiseToExecuteLater(); requestConnect().addListener(f -> executeAgain());}} and {{maybeExecuteAgain();}}. Thing is {{executeAgain}} will run _only_ if {{maybeExecuteAgain}} was executed _before_ it. It works fine for small messages since we have strict ordering: {{maybeExecuteAgain}} runs on the event loop, and {{executeAgain}} will execute on the event loop, too. However, on large messages thread, {{maybeExecuteAgain}} is called from large messages thread, while {{executeAgain}} is called from the normal thread. {{executeAgain}} runs before {{maybeExecuteAgain}} and we’ll wait indefinitely long. You can reproduce it by running {{testCloseIfEndpointDown}} enough times (triggers extremely rarely, so better run only with large messages). I wasn’t able to reproduce this outside of closing. * it’d be great to add some comments to {{FrameDecoderLegacy}} * in IMH, we have the same code in {{onCorruptFrame}}, {{abort}}, {{onIntactFrame}}. I'm definitely not advocating against duplication (at least not in this particular case). But it might be good to either comment instances of usages of the code (see below) (i.e., explain that we don't want to double-release on corrupt+expired or other combinations of frames). {code} if (!isExpired && !isCorrupt) { releaseBuffers(); releaseCapacity(size); } {code} Short question: How is the approximate clock implementation with sampling better than just using a regular clock? Is the intention to normalize clock calls to yield the epoch timestamp or the intention is to improve performance by doing so periodically? Also, what are we going to do with all the TODOs?.. Should we create follow-up tickets for them? New {{OutboundConnection}} state machine looks great. Also, wanted to give special props & thanks for the Verifier and some of the Queue tests. It is great to see more testing that tests behaviours through randomization and verification and not just unit testing. I think there's a lot to learn from these examples for everyone in the community. > Improvements to Internode Messaging > ----------------------------------- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode > Reporter: Benedict > Assignee: Benedict > Priority: High > Fix For: 4.0 > > Attachments: 20k_backfill.png, 60k_RPS.png, > 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, > baseline_patch_vs_30x.png, increasing_reads_latency.png, > many_reads_cass_perf_ft_msg_tst.svg > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org