I think it's all part of the same issue and you're not derailing IMO Abe. For 
the user Pabbireddy here, the unexpected behavior was not closing internode 
connections on that keystore refresh. So ISTM, from a "featureset that would be 
nice to have here" perspective, we could theoretically provide:
 1. An option to disconnect all connections on cert update, disabled by default
 2. An option to drain and recycle connections on a time period, also disabled 
by default
Leave the current behavior in place but allow for these kind of strong 
cert-guarantees if a user needs it in their env.

On Mon, Apr 15, 2024, at 9:51 PM, Abe Ratnofsky wrote:
> Not to derail from the original conversation too far, but wanted to agree 
> that maximum connection establishment time on native transport would be 
> useful. That would provide a maximum duration before an updated client 
> keystore is used for connections, which can be used to safely roll out client 
> keystore updates.
> 
> For example, if the maximum connection establishment time is 12 hours, then 
> you can update the keystore on a canary client, wait 24 hours, confirm that 
> connectivity is maintained, then upgrade keystores across the rest of the 
> fleet.
> 
> With unbounded connection establishment, reconnection isn't tested as often 
> and issues can hide behind long-lived connections.
> 
>> On Apr 15, 2024, at 5:14 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>> 
>> It seems like if folks really want the life of a connection to be finite 
>> (either client/server or server/server), adding in an option to quietly 
>> drain and recycle a connection on some period isn’t that difficult.
>> 
>> That type of requirement shows up in a number of environments, usually on 
>> interactive logins (cqlsh, login, walk away, the connection needs to become 
>> invalid in a short and finite period of time), but adding it to internode 
>> could also be done, and may help in some weird situations (if you changed 
>> certs because you believe a key/cert is compromised, having the connection 
>> remain active is decidedly inconvenient, so maybe it does make sense to add 
>> an expiration timer/condition on each connection).
>> 
>> 
>> 
>>> On Apr 15, 2024, at 12:28 PM, Dinesh Joshi <djo...@apache.org> wrote:
>>> 
>>> In addition to what Andy mentioned, I want to point out that for the vast 
>>> majority of use-cases, we would like to _avoid_ interruptions when a 
>>> certificate is updated so it is by design. If you're dealing with a 
>>> situation where you want to ensure that the connections are cycled, you can 
>>> follow Andy's advice. It will require automation outside the database that 
>>> you might already have. If there is demand, we can consider adding a 
>>> feature to slowly cycle the connections so the old SSL context is not used 
>>> anymore.
>>> 
>>> One more thing you should bear in mind is that Cassandra will not load the 
>>> new SSL context if it cannot successfully initialize it. This is again by 
>>> design to prevent an outage when the updated truststore is corrupted or 
>>> could not be read in some way.
>>> 
>>> thanks,
>>> Dinesh
>>> 
>>> On Mon, Apr 15, 2024 at 9:45 AM Tolbert, Andy <x...@andrewtolbert.com> 
>>> wrote:
>>>> I should mention, when toggling disablebinary/enablebinary between
>>>> instances, you will probably want to give some time between doing this
>>>> so connections can reestablish, and you will want to verify that the
>>>> connections can actually reestablish.  You also need to be mindful of
>>>> this being disruptive to inflight queries (if your client is
>>>> configured for retries it will probably be fine).  Semantically to
>>>> your applications it should look a lot like a rolling cluster bounce.
>>>> 
>>>> Thanks,
>>>> Andy
>>>> 
>>>> On Mon, Apr 15, 2024 at 11:39 AM pabbireddy avinash
>>>> <pabbireddyavin...@gmail.com> wrote:
>>>> >
>>>> > Thanks Andy for your reply . We will test the scenario you mentioned.
>>>> >
>>>> > Regards
>>>> > Avinash
>>>> >
>>>> > On Mon, Apr 15, 2024 at 11:28 AM, Tolbert, Andy <x...@andrewtolbert.com> 
>>>> > wrote:
>>>> >>
>>>> >> Hi Avinash,
>>>> >>
>>>> >> As far as I understand it, if the underlying keystore/trustore(s)
>>>> >> Cassandra is configured for is updated, this *will not* provoke
>>>> >> Cassandra to interrupt existing connections, it's just that the new
>>>> >> stores will be used for future TLS initialization.
>>>> >>
>>>> >> Via: 
>>>> >> https://cassandra.apache.org/doc/4.1/cassandra/operating/security.html#ssl-certificate-hot-reloading
>>>> >>
>>>> >> > When the files are updated, Cassandra will reload them and use them 
>>>> >> > for subsequent connections
>>>> >>
>>>> >> I suppose one could do a rolling disablebinary/enablebinary (if it's
>>>> >> only client connections) after you roll out a keystore/truststore
>>>> >> change as a way of enforcing the existing connections to reestablish.
>>>> >>
>>>> >> Thanks,
>>>> >> Andy
>>>> >>
>>>> >>
>>>> >> On Mon, Apr 15, 2024 at 11:11 AM pabbireddy avinash
>>>> >> <pabbireddyavin...@gmail.com> wrote:
>>>> >> >
>>>> >> > Dear Community,
>>>> >> >
>>>> >> > I hope this email finds you well. I am currently testing SSL 
>>>> >> > certificate hot reloading on a Cassandra cluster running version 4.1 
>>>> >> > and encountered a situation that requires your expertise.
>>>> >> >
>>>> >> > Here's a summary of the process and issue:
>>>> >> >
>>>> >> > Reloading Process: We reloaded certificates signed by our in-house 
>>>> >> > certificate authority into the cluster, which was initially running 
>>>> >> > with self-signed certificates. The reload was done node by node.
>>>> >> >
>>>> >> > Truststore and Keystore: The truststore and keystore passwords are 
>>>> >> > the same across the cluster.
>>>> >> >
>>>> >> > Unexpected Behavior: Despite the different truststore configurations 
>>>> >> > for the self-signed and new CA certificates, we observed no breakdown 
>>>> >> > in server-to-server communication during the reload. We did not 
>>>> >> > upload the new CA cert into the old truststore.We anticipated 
>>>> >> > interruptions due to the differing truststore configurations but did 
>>>> >> > not encounter any.
>>>> >> >
>>>> >> > Post-Reload Changes: After reloading, we updated the cqlshrc file 
>>>> >> > with the new CA certificate and key to connect to cqlsh.
>>>> >> >
>>>> >> > server_encryption_options:
>>>> >> >
>>>> >> >     internode_encryption: all
>>>> >> >
>>>> >> >     keystore: ~/conf/server-keystore.jks
>>>> >> >
>>>> >> >     keystore_password: XXXX
>>>> >> >
>>>> >> >     truststore: ~/conf/server-truststore.jks
>>>> >> >
>>>> >> >     truststore_password: XXXX
>>>> >> >
>>>> >> >     protocol: TLS
>>>> >> >
>>>> >> >     algorithm: SunX509
>>>> >> >
>>>> >> >     store_type: JKS
>>>> >> >
>>>> >> >     cipher_suites: [TLS_RSA_WITH_AES_256_CBC_SHA]
>>>> >> >
>>>> >> >     require_client_auth: true
>>>> >> >
>>>> >> > client_encryption_options:
>>>> >> >
>>>> >> >     enabled: true
>>>> >> >
>>>> >> >     keystore: ~/conf/server-keystore.jks
>>>> >> >
>>>> >> >     keystore_password: XXXX
>>>> >> >
>>>> >> >     require_client_auth: true
>>>> >> >
>>>> >> >     truststore: ~/conf/server-truststore.jks
>>>> >> >
>>>> >> >     truststore_password: XXXX
>>>> >> >
>>>> >> >     protocol: TLS
>>>> >> >
>>>> >> >     algorithm: SunX509
>>>> >> >
>>>> >> >     store_type: JKS
>>>> >> >
>>>> >> >     cipher_suites: [TLS_RSA_WITH_AES_256_CBC_SHA]
>>>> >> >
>>>> >> > Given this situation, I have the following questions:
>>>> >> >
>>>> >> > Could there be a reason for the continuity of server-to-server 
>>>> >> > communication despite the different truststores?
>>>> >> > Is there a possibility that the old truststore remains cached even 
>>>> >> > after reloading the certificates on a node?
>>>> >> > Have others encountered similar issues, and if so, what were your 
>>>> >> > solutions?
>>>> >> >
>>>> >> > Any insights or suggestions would be greatly appreciated. Please let 
>>>> >> > me know if further information is needed.
>>>> >> >
>>>> >> > Thank you
>>>> >> >
>>>> >> > Best regards,
>>>> >> >
>>>> >> > Avinash

Reply via email to