[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging

Benedict (JIRA) Mon, 25 Mar 2019 18:06:24 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801283#comment-16801283
 ]


Benedict commented on CASSANDRA-15066:
--------------------------------------

h2. Outbound Connections

h3. Opening a connection
We behave consistently for all kinds of failure to connect, including: refused 
by endpoint, incompatible versions, bad handshake, unexpected exceptions; namely
* Retry forever, until either success or no messages waiting to deliver
* Wait incrementally longer periods before reconnecting, up to a maximum of 1s 
(instead of yo-yo-ing between 100ms and 1s)
* While failing to connect, we do not acquire reserve queue limits (any already 
claimed waits until regular expiry)

h3. Closing a connection
* Correctly drains outbound messages that are waiting to be delivered (unless 
disconnected and fail to reconnect)
* Messages written to a closing connection are either delivered or rejected, 
with a new connection being opened if the old is irrevocably closed
* Unused connections are pruned eventually
Cassandra does send messages to logically dead endpoints, due to races or other 
conditions; today these would effectively leak connections, as they are not 
followed by an attempt to close the connection.

h3. Reconnecting
We sometimes need to reconnect a perfectly valid connection, e.g. if the 
preferred IP address changes.  This is a rare scenario, but we now ensure that 
the underlying connection has no in-progress operations before closing it and 
reconnecting.

h3. Message Failure
Propagates to callbacks instantly, better preventing overload by reclaiming 
committed memory.
* Expiry
** No longer experiences head-of-line blocking (e.g. undroppable message 
preventing all droppable messages from being expired)
** While overloaded, expiry is attempted eagerly on enqueuing threads
** While disconnected we schedule regular pruning, to handle the case where 
messages are no longer being sent, but we have a large backlog to expire
* Overload
** Tracked by bytes queued, as opposed to number of messages
* Serialization Errors
** Do not result in the connection being invalidated; the message is simply 
completed with failure, then erased from the frame
** Includes detected mismatch between calculated serialisation size to actual
* Failures to flush to network, perhaps because the connection has been reset
** These are not currently notified to callback handlers, as the necessary 
information has been discarded, though it would be possible to do so in future 
if we decide it is worth our while

h3. Resource Limits
We improve system stability by enforcing strict limits on the number of 
outbound messages we may queue, measured by the serializedSize of the message.  
There are three separate limits we impose simultaneously to ensure that 
progress is always made without any reasonable combination of failures 
impacting a node’s stability.
* Per Connection: always permit connections to queue messages up to this limit
* Globally: bytes exceeding the per-connection threshold must be allocated from 
a global limit
* Per Endpoint: bytes allocated from the global limit are also capped 
per-endpoint

h3. QoS
“Gossip” connection has been replaced with a general purpose “Urgent” 
connection, for any small messages impacting system stability

h3. Metrics
We track, and expose via Virtual Table and JMX, the number of messages and 
bytes that: we could not serialise or flush due to an error, we dropped due to 
overload or timeout, are pending, and have successfully sent


> Improvements to Internode Messaging
> -----------------------------------
>
>                 Key: CASSANDRA-15066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15066
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Internode
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Normal
>             Fix For: 4.0
>
>
> CASSANDRA-8457 introduced asynchronous networking to internode messaging, but 
> there have been several follow-up endeavours to improve some semantic issues. 
>  CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were 
> combined some months ago into a single overarching refactor of the original 
> work, to address some of the issues that have been discovered.  Given the 
> criticality of this work to the project, we wanted to bring some more eyes to 
> bear to ensure the release goes ahead smoothly.  In doing so, we uncovered a 
> number of issues with messaging, some of which long standing, that we felt 
> needed to be addressed.  This patch widens the scope of CASSANDRA-14503 and 
> CASSANDRA-13630 in an effort to close the book on the messaging service, at 
> least for the foreseeable future.
> The patch includes a number of clarifying refactors that touch outside of the 
> {{net.async}} package, and a number of semantic changes to the {{net.async}} 
> packages itself.  We believe it clarifies the intent and behaviour of the 
> code while improving system stability, which we will outline in comments 
> below.
> https://github.com/belliottsmith/cassandra/tree/messaging-improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15066) Improvements to Internode Messaging

Reply via email to