Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Patrick McFadin Wed, 11 Feb 2026 11:43:46 -0800

Thanks, Jane, you've clarified a lot.

One area I’m still thinking about is how to handle non-opt-in connections
in mixed fleets. Even if they don’t receive GRACEFUL_DISCONNECT, the server
will still need a deterministic draining policy for them. It might help the
CEP to explicitly state how draining applies to connections that never
REGISTER the event (same grace window and eventual enforcement), so
operators understand the guarantees in mixed environments.


Also, since this introduces a per-connection draining state, it may be
worth explicitly noting that drivers will need per-connection scheduling
awareness rather than treating host pools as homogeneous. That could help
driver developers interpret the intent consistently.

Patrick

On Tue, Feb 10, 2026 at 9:23 PM Jaydeep Chovatia <[email protected]>
wrote:

> >The proposed solution is to add an in-band GRACEFUL_DISCONNECT event
> that both control and query connections can opt into via REGISTER. When a
> node is shutting down, it will emit the event to all subscribed
> connections. Drivers will stop sending new queries on that connection/host,
> allow in-flight requests to finish, then reconnect with exponential backoff.
>
> Overall, I like the proposal because it helps reduce p99 latency from the
> client perspective by avoiding retries when the Cassandra server is
> restarted for planned activities, which happen quite frequently.
>
>
> On Tue, Feb 10, 2026 at 4:08 PM Jane H <[email protected]> wrote:
>
>> Hi Runtian,
>>
>> Yes, GRACEFUL_DISCONNECT is a per-connection draining signal, meaning a
>> node tells the client that this connection is going away, rather than a
>> node telling the client about other nodes.
>>
>> There isn’t a reliable way for a driver to distinguish whether a node is
>> permanently going away or just restarting, and Graceful Disconnect is
>> intentionally scoped to connection draining, not lifecycle intent.
>> In practice, this isn’t a big issue. After receiving GRACEFUL_DISCONNECT,
>> the driver stops sending new queries and only retries reconnection with
>> backoff. If the node is later removed from cluster metadata, the driver
>> stops reconnecting. The overhead during that window is limited to a few
>> reconnection attempts.
>> Adding intent like “going away” vs. “restarting” would be hard to make
>> reliable, since the server often doesn’t know it's shutting down or coming
>> back up.
>>
>> Thank you for your questions! Hope this clarifies.
>>
>> Sincerely,
>> Jane
>>
>> On Wed, Feb 4, 2026 at 11:07 AM Runtian Liu <[email protected]> wrote:
>>
>>> Hi Jane, all,
>>>
>>> Thanks for the detailed discussion so far—this proposal resonates
>>> strongly with issues we see in production.
>>>
>>> I wanted to raise a related scenario around *gray failures* during
>>> shutdown. In some cases, an operator has clear intent to shut down a node
>>> (or the host is unhealthy), but the Cassandra process remains reachable for
>>> some time. During this window, service clients can still connect and
>>> continue sending queries, which often leads to timeouts and confusing
>>> behavior downstream. Gossip-based DOWN events or socket closes are not
>>> always timely or reliable enough to prevent this.
>>>
>>> A couple of questions on the intended semantics of GRACEFUL_DISCONNECT
>>> in this context:
>>>
>>>    1.
>>>
>>>    *Can GRACEFUL_DISCONNECT be emitted explicitly by server-side
>>>    tooling (or an operator-triggered path)* to signal intentional
>>>    unavailability, even if the process is still alive?
>>>    This would allow operators to proactively instruct clients to stop
>>>    sending traffic to a node before or during shutdown-related gray 
>>> failures.
>>>    2.
>>>
>>>    After receiving GRACEFUL_DISCONNECT, *drivers may still attempt
>>>    reconnection after backoff*.
>>>    Is there a way for drivers to deterministically know that the server
>>>    is intentionally being taken out of service (as opposed to transient
>>>    unavailability), so that they avoid sending *any* new queries to
>>>    that node until a restart or explicit “back in service” signal is 
>>> observed?
>>>
>>> Put differently, I’m curious whether GRACEFUL_DISCONNECT is meant to be
>>> purely a per-connection draining signal, or whether it could also serve as
>>> a stronger expression of *operator intent* to remove a node from
>>> service—something that would help eliminate gray-failure traffic entirely.
>>>
>>> Thanks again for pushing this forward; it looks very promising.
>>>
>>> Best,
>>> Runtian
>>>
>>> On Thu, Jan 29, 2026 at 5:44 PM Jane H <[email protected]> wrote:
>>>
>>>> Hi Patrick,
>>>>
>>>> Thanks for reading the CEP and for the thoughtful questions! Replies
>>>> below.
>>>>
>>>> Driver backward compatibility / mixed rollouts
>>>> --------
>>>> This is fully opt-in per connection. Older drivers won’t REGISTER for
>>>> GRACEFUL_DISCONNECT, so servers won’t send it to them, and those
>>>> connections behave exactly as they do today.
>>>>
>>>> REGISTER vs STARTUP for opt-in
>>>> -------
>>>> There are two plausible ways for a driver to opt in to
>>>> GRACEFUL_DISCONNECT:
>>>>
>>>> Option A: REGISTER (as proposed today)
>>>> | Driver behavior                                            | Server
>>>> behavior
>>>>              |
>>>>
>>>> |------------------------------------------------------------|------------------------------------------------------------------------------------------------|
>>>> | Send `OPTIONS`                                             | Return
>>>> `SUPPORTED` (`Map<String, List<String>>`) containing
>>>> `"GRACEFUL_DISCONNECT": ["true"]`. |
>>>> | Send `STARTUP` as normal                                   |
>>>> Optionally handle authentication as normal. Send `READY` as normal.
>>>>                    |
>>>> | Send `REGISTER` including event type `GRACEFUL_DISCONNECT` |
>>>> Acknowledge normally (e.g., `READY`).
>>>>                    |
>>>>
>>>> This is consistent with the protocol: REGISTER is the standard
>>>> mechanism to subscribe to events.
>>>> However, this does add an extra round trip per query connection that
>>>> wants the event. Today most drivers only REGISTER on the control connection
>>>> for cluster-wide events (STATUS_CHANGE / TOPOLOGY_CHANGE / SCHEMA_CHANGE),
>>>> and query connections typically do not REGISTER anything. If we want every
>>>> query connection to receive GRACEFUL_DISCONNECT (because the signal is
>>>> connection-local), then every query connection would need to send REGISTER,
>>>> which means one additional message exchange during connection 
>>>> establishment.
>>>>
>>>> Option B: STARTUP opt-in (alternative)
>>>> | Driver behavior
>>>>                                                         | Server behavior
>>>>
>>>>  |
>>>> |
>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>> |
>>>> ----------------------------------------------------------------------------------------------
>>>> |
>>>> | Send `OPTIONS`
>>>>                                                          | Return
>>>> `SUPPORTED` (`Map<String, List<String>>`) containing
>>>> `"GRACEFUL_DISCONNECT": ["true"]`. |
>>>> | Send STARTUP with an additional entry in the options map, e.g. {
>>>> "CQL_VERSION": "3.0.0", "GRACEFUL_DISCONNECT": "true", ... } | Optionally
>>>> handle authentication as normal. Send `READY` as normal.
>>>>          |
>>>>
>>>> This avoids the extra round trip, because the opt-in piggybacks on an
>>>> existing step in the handshake. But it introduces new semantics: STARTUP
>>>> options would be used to request an event stream subscription, which is
>>>> non-standard given that REGISTER already exists for that purpose.
>>>>
>>>> Given the above, we prefer REGISTER for Protocol semantics consistency,
>>>> even though it costs one additional round trip on each query connection
>>>> that opts in.
>>>>
>>>> Signal multiplication
>>>> --------
>>>> The protocol guidance about “don’t REGISTER on all connections” is
>>>> primarily aimed at the existing out-of-band events (STATUS_CHANGE /
>>>> TOPOLOGY_CHANGE / SCHEMA_CHANGE). Those events are gossip-driven and
>>>> broadcast by multiple nodes, so registering on many connections can easily
>>>> produce redundant notifications.
>>>>
>>>> Concrete example (duplication with STATUS_CHANGE):
>>>> * In a 3-node cluster (node1, node2, node3), node1 is going down.
>>>> * Node2 and node3 learn about node1’s state change via gossip.
>>>> * Both node2 and node3 will send a STATUS_CHANGE event (“node1 is
>>>> DOWN”) to every client connection that registered for STATUS_CHANGE.
>>>> * If a driver registers for STATUS_CHANGE on connections to both node2
>>>> and node3, it will receive two notifications for the same cluster event.
>>>> That’s the “signal multiplication” the spec warns about.
>>>>
>>>> But the protocol does not stop us from adding an in-band event like
>>>> GRACEFUL_DISCONNECT. In the above example of node1 going down:
>>>> * GRACEFUL_DISCONNECT is in-band and connection-local, not
>>>> gossip/broadcast.
>>>> * Only the node that is actually shutting down (node1) emits
>>>> GRACEFUL_DISCONNECT, and it emits it only on its own native connections
>>>> that opted in.
>>>> * Node2 and node3 do not emit GRACEFUL_DISCONNECT for node1’s shutdown,
>>>> because they are not the node being drained.
>>>> So even if a driver has connections to node2 and node3 that are
>>>> registered for other events, it will not receive any GRACEFUL_DISCONNECT
>>>> from them for node1 going down.
>>>>
>>>> I understand such an in-band event is new. We can add a clarification
>>>> to the protocol explaining that the recommendation of “don’t REGISTER on
>>>> all connections” will not apply to in-band events like GRACEFUL_DISCONNECT.
>>>>
>>>> Event timing for operators
>>>> ---------
>>>> Servers should emit GRACEFUL_DISCONNECT whenever it needs to close a
>>>> connection gracefully, regardless of the trigger.
>>>> I’ll update the CEP to clarify that GRACEFUL_DISCONNECT to whenever the
>>>> server intends to close a connection gracefully, including nodetool drain,
>>>> nodetool disablebinary + shutdown, rolling restarts, or a controlled JVM
>>>> shutdown hook path.
>>>>
>>>> Operator control + observability
>>>> ---------
>>>> +1. I agree to add the server-side configs:
>>>> graceful_disconnect_enabled graceful_disconnect_grace_period_ms
>>>> graceful_disconnect_max_drain_ms
>>>> And metrics/counters such as: connections_draining forced_disconnects
>>>> I’ll update the CEP accordingly.
>>>>
>>>> Thanks again—this feedback is super helpful for tightening the proposal.
>>>>
>>>> Regards,
>>>> Jane
>>>>
>>>> On Wed, Jan 14, 2026 at 1:33 PM Patrick McFadin <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Jane,
>>>>>
>>>>> Thank you for the thought-out CEP. I certainly see the use of
>>>>> a feature like this to add resilience during cluster state changes. I have
>>>>> a few questions after reading the CEP.
>>>>>
>>>>> Driver compatibility: The way I read this, it's based on an ideal
>>>>> scenario where client and server are on the same version to support this
>>>>> feature. In my experience, client rollouts are never complete and often 
>>>>> lag
>>>>> far behind the cluster upgrade. What happens when the driver completely
>>>>> ignores GRACEFUL_DISCONNECT? It might mean considering something on the
>>>>> server side.
>>>>>
>>>>> Discovery things: Speaking of the client, you want to use the
>>>>> SUPPORTED as listed in the v4 spec[1], but why not add this to STARTUP? 
>>>>> You
>>>>> mention something in the "Rejected alternatives," but could you expand 
>>>>> your
>>>>> thinking here?
>>>>>
>>>>> Signal multiplication: You have this in the CEP "Other protocols
>>>>> (HTTP/2, PostgreSQL, Redis Cluster) use connection-local in-band signals 
>>>>> to
>>>>> enable safe draining." Our protocol guidance[1] explicitly notes that
>>>>> drivers often keep multiple connections and should not register for events
>>>>> on all of them, as this duplicates traffic. I don't know how you could
>>>>> ensure that every connection would be aware of a GRACEFUL_DISCONNECT
>>>>> without changing that aspect of the spec.
>>>>>
>>>>>
>>>>> Event timing for operators: It's not clear to me when
>>>>> the GRACEFUL_DISCONNECT is emitted when you do something like a drain,
>>>>> disablebinary or just a JVM shutdown hook. This is crucial for operators 
>>>>> to
>>>>> understand how this could work and should be in the CEP spec for clarity. 
>>>>> I
>>>>> think it will matter to a lot of people.
>>>>>
>>>>> Operator control: I've been on this push for a while and so I have to
>>>>> mention it. Opt-in vs default. We need more controls in the config YAML.
>>>>> graceful_disconnect_enabled
>>>>>
>>>>> If there is a server-side component:
>>>>> graceful_disconnect_grace_period_ms
>>>>> graceful_disconnect_max_drain_ms
>>>>>
>>>>> And finally, it needs more observability...
>>>>> logging/metrics counters: connections_draining, forced_disconnects
>>>>>
>>>>>
>>>>> Thanks for proposing this!
>>>>>
>>>>> Patrick
>>>>>
>>>>> 1 -
>>>>> https://cassandra.apache.org/doc/latest/cassandra/_attachments/native_protocol_v4.html
>>>>>
>>>>> On Tue, Jan 13, 2026 at 4:30 PM Jane H <[email protected]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I’d like to start a discussion on a CEP proposal: *CEP-59: Graceful
>>>>>> Disconnect*, to make intentional node shutdown/drain less disruptive
>>>>>> for clients (link:
>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406619103
>>>>>> ).
>>>>>>
>>>>>> Today, intentional node shutdown (e.g., rolling restarts) can still
>>>>>> be disruptive from a client perspective. Drivers often ignore DOWN
>>>>>> events because they are not reliable, and outstanding requests can end up
>>>>>> as client-facing TimeOut exceptions.
>>>>>>
>>>>>> The proposed solution is to add an in-band GRACEFUL_DISCONNECT event
>>>>>> that both control and query connections can opt into via REGISTER.
>>>>>> When a node is shutting down, it will emit the event to all subscribed
>>>>>> connections. Drivers will stop sending new queries on that 
>>>>>> connection/host,
>>>>>> allow in-flight requests to finish, then reconnect with exponential 
>>>>>> backoff.
>>>>>>
>>>>>> If you have thoughts on the proposed protocol, server shutdown
>>>>>> behavior, driver expectations, edge cases, or general feedback, I’d 
>>>>>> really
>>>>>> appreciate it.
>>>>>>
>>>>>> Regards,
>>>>>> Jane
>>>>>>
>>>>>

Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Reply via email to