[ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861303#comment-16861303
 ] 

Alex Petrov commented on CASSANDRA-15066:
-----------------------------------------

Thank you for the patch.

Patch looks good and latest additions are great. I'm happy to +1 it, I just 
have several comments, most of them are minor:

  * looks like monotonic clock implementation might race. each operation is 
synchronized but their combination isn’t:

{code}
        public void refreshNow()
        {
            pauseNowSampling();
            resumeNowSampling();
        }
{code}

  * in {{waitUntilFlushed}} i’d swap {{int signalWhenExcessBytesWritten, int 
wakeUpWhenExcessBytesWritten}} since other methods (like 
{{parkUntilFlushed(long wakeUpWhenFlushed, long signalWhenFlushed)}}) use wake 
up first and then signal
  * in {{releaseSpace}} i’d leave a comment why thread is unparked but 
{{waiting}} is not set to {{null}} ({{parkUntilFlushed}} releases it)
  * in {{resumeEpochSampling}} and {{resumeNowSampling}} we can use 
{{scheduleWithFixedDelay(task, 0, … )}} instead of running a task upfront
  * Not sure if it’s necessary, but we can save the results of 
{{chunk0.free()}} and {{chunk1.free()}}. But since array was inlined, I assume 
performance was important.
  * second switch in {{BufferPool}} microqueue {{removeIf}} inverts the logic 
in case 1. I’d add a small comment that the only two cases we need to shift are 
{{null, chunk, null}} and {{null, null, chunk}}.
  * Should error in approximate time be an absolute value? Especially since we 
seem to compare two error values later. However, it seems it has to be the case 
anyways
  * In {{OutboundConnection}}, there seems to be a race condition between 
between {{promiseToExecuteLater(); requestConnect().addListener(f -> 
executeAgain());}} and {{maybeExecuteAgain();}}. Thing is {{executeAgain}} will 
run _only_ if {{maybeExecuteAgain}} was executed _before_ it. It works fine for 
small messages since we have strict ordering: {{maybeExecuteAgain}} runs on the 
event loop, and {{executeAgain}} will execute on the event loop, too. However, 
on large messages thread, {{maybeExecuteAgain}} is called from large messages 
thread, while {{executeAgain}} is called from the normal thread. 
{{executeAgain}} runs before {{maybeExecuteAgain}} and we’ll wait indefinitely 
long. You can reproduce it by running {{testCloseIfEndpointDown}} enough times 
(triggers extremely rarely, so better run only with large messages). I wasn’t 
able to reproduce this outside of closing.
  * it’d be great to add some comments to {{FrameDecoderLegacy}}
  * in IMH, we have the same code in {{onCorruptFrame}}, {{abort}}, 
{{onIntactFrame}}. I'm definitely not advocating against duplication (at least 
not in this particular case). But it might be good to either comment instances 
of usages of the code (see below) (i.e., explain that we don't want to 
double-release on corrupt+expired or other combinations of frames).

{code}
            if (!isExpired && !isCorrupt)
            {
                releaseBuffers();
                releaseCapacity(size);
            }
{code}

Short question: How is the approximate clock implementation with sampling 
better than just using a regular clock? Is the intention to normalize clock 
calls to yield the epoch timestamp or the intention is to improve performance 
by doing so periodically?

Also, what are we going to do with all the TODOs?.. Should we create follow-up 
tickets for them?

New {{OutboundConnection}} state machine looks great. Also, wanted to give 
special props & thanks for the Verifier and some of the Queue tests. It is 
great to see more testing that tests behaviours through randomization and 
verification and not just unit testing. I think there's a lot to learn from 
these examples for everyone in the community.

> Improvements to Internode Messaging
> -----------------------------------
>
>                 Key: CASSANDRA-15066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Internode
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: High
>             Fix For: 4.0
>
>         Attachments: 20k_backfill.png, 60k_RPS.png, 
> 60k_RPS_CPU_bottleneck.png, backfill_cass_perf_ft_msg_tst.svg, 
> baseline_patch_vs_30x.png, increasing_reads_latency.png, 
> many_reads_cass_perf_ft_msg_tst.svg
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to