[ https://issues.apache.org/jira/browse/CASSANDRA-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801283#comment-16801283 ]
Benedict commented on CASSANDRA-15066: -------------------------------------- h2. Outbound Connections h3. Opening a connection We behave consistently for all kinds of failure to connect, including: refused by endpoint, incompatible versions, bad handshake, unexpected exceptions; namely * Retry forever, until either success or no messages waiting to deliver * Wait incrementally longer periods before reconnecting, up to a maximum of 1s (instead of yo-yo-ing between 100ms and 1s) * While failing to connect, we do not acquire reserve queue limits (any already claimed waits until regular expiry) h3. Closing a connection * Correctly drains outbound messages that are waiting to be delivered (unless disconnected and fail to reconnect) * Messages written to a closing connection are either delivered or rejected, with a new connection being opened if the old is irrevocably closed * Unused connections are pruned eventually Cassandra does send messages to logically dead endpoints, due to races or other conditions; today these would effectively leak connections, as they are not followed by an attempt to close the connection. h3. Reconnecting We sometimes need to reconnect a perfectly valid connection, e.g. if the preferred IP address changes. This is a rare scenario, but we now ensure that the underlying connection has no in-progress operations before closing it and reconnecting. h3. Message Failure Propagates to callbacks instantly, better preventing overload by reclaiming committed memory. * Expiry ** No longer experiences head-of-line blocking (e.g. undroppable message preventing all droppable messages from being expired) ** While overloaded, expiry is attempted eagerly on enqueuing threads ** While disconnected we schedule regular pruning, to handle the case where messages are no longer being sent, but we have a large backlog to expire * Overload ** Tracked by bytes queued, as opposed to number of messages * Serialization Errors ** Do not result in the connection being invalidated; the message is simply completed with failure, then erased from the frame ** Includes detected mismatch between calculated serialisation size to actual * Failures to flush to network, perhaps because the connection has been reset ** These are not currently notified to callback handlers, as the necessary information has been discarded, though it would be possible to do so in future if we decide it is worth our while h3. Resource Limits We improve system stability by enforcing strict limits on the number of outbound messages we may queue, measured by the serializedSize of the message. There are three separate limits we impose simultaneously to ensure that progress is always made without any reasonable combination of failures impacting a node’s stability. * Per Connection: always permit connections to queue messages up to this limit * Globally: bytes exceeding the per-connection threshold must be allocated from a global limit * Per Endpoint: bytes allocated from the global limit are also capped per-endpoint h3. QoS “Gossip” connection has been replaced with a general purpose “Urgent” connection, for any small messages impacting system stability h3. Metrics We track, and expose via Virtual Table and JMX, the number of messages and bytes that: we could not serialise or flush due to an error, we dropped due to overload or timeout, are pending, and have successfully sent > Improvements to Internode Messaging > ----------------------------------- > > Key: CASSANDRA-15066 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15066 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Internode > Reporter: Benedict > Assignee: Benedict > Priority: Normal > Fix For: 4.0 > > > CASSANDRA-8457 introduced asynchronous networking to internode messaging, but > there have been several follow-up endeavours to improve some semantic issues. > CASSANDRA-14503 and CASSANDRA-13630 are the latest such efforts, and were > combined some months ago into a single overarching refactor of the original > work, to address some of the issues that have been discovered. Given the > criticality of this work to the project, we wanted to bring some more eyes to > bear to ensure the release goes ahead smoothly. In doing so, we uncovered a > number of issues with messaging, some of which long standing, that we felt > needed to be addressed. This patch widens the scope of CASSANDRA-14503 and > CASSANDRA-13630 in an effort to close the book on the messaging service, at > least for the foreseeable future. > The patch includes a number of clarifying refactors that touch outside of the > {{net.async}} package, and a number of semantic changes to the {{net.async}} > packages itself. We believe it clarifies the intent and behaviour of the > code while improving system stability, which we will outline in comments > below. > https://github.com/belliottsmith/cassandra/tree/messaging-improvements -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org