Hello Sebastien,
Yes, your approach is really interesting. I will test this in my system as
well. I think it reduces some risks involved in the procedure that was
discussed in the previous emails.
Just for the record, availability is a top priority for my use cases that is
why I have switched the default consistency level for
authentication/authorization to LOCAL_ONE as it used in previous C*
versions.

BR
MK
-----Original Message-----
From: Sebastian Marsching <sebast...@marsching.com> 
Sent: April 22, 2024 21:58
To: Michalis Kotsiouros (EXT) via user <user@cassandra.apache.org>
Subject: Re: Datacenter decommissioning on Cassandra 4.1.4

Recently, I successfully used the following procedure when decommissioning a
datacenter:

1. Reduced the replication factor for this DC to zero for all keyspaces
except the system_auth keyspace. For that keyspace, I reduced the RF to one.
2. Decommissioned all nodes except one in the DC using the regular procedure
(no --force needed).
3. Decommissioned the last node using --force.
4. Set the RF for the system_auth keyspace to 0.

This procedure has two benefits:

1. Authentication on the nodes in the DC being decommissioned will work
until the last node has been decommissioned. This is important when
authentication is enabled for JMX. Otherwise, you cannot proceed when there
are too few nodes left to get a LOCAL_QUORUM on system_auth.
2. One does not have to use --force except when removing the last node.

It would be nice if the RF for the system_auth keyspace could be reduced to
zero before decommissioning the nodes. However, I think that implementing
this correctly may be hard. If there are no local replicas, queries with a
consistency level of LOCAL_QUORUM will probably fail, and this is the
consistency level used for all authentication and authorization related
queries. So, setting the RF to zero might break authentication and
authorization, which in turn might make it impossible to decommission the
nodes (without disabling authentication for that DC).

So, I guess that the code dealing with authentication and authorization
would have to be changed to use a CL of QUORUM instead of LOCAL_QUORUM when
system_auth is not replicated in the local DC.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to