I think in the context of what I think initially motivated this hot
reloading capability, a big win it provides is avoiding having to
bounce your cluster as your certificates near expiry.  If not watched
closely you can get yourself into a state where every node in the
cluster's cert expired, which is effectively an outage.

I see the appeal of draining connections on a change of trust,
although the necessity of being able to "do it live" (as opposed to
doing a bounce) seems less important then avoiding the outage
condition of your certificates expiring, especially since you can sort
of already do this without bouncing by toggling nodetool
disablebinary/enablebinary.  I agree with Dinesh that most operators
would prefer that it does not do that as interrupting connections can
be disruptive to applications if they don't have retries configured,
but I also agree it'd be a nice improvement to support draining
existing connections in some way.

+1 on the idea of having a "timed connection" capability brought up
here, and implementing it in a way such that connection lifetimes can
be dynamically adjusted.  This way it can be made such that on a trust
store change Cassandra could simply adjust the connection lifetimes
and they will be disconnected immediately or drained over a time
period like Josh proposed.

Thanks,
Andy

Reply via email to