Hi,

as we are currently facing the same challenge (upgrading an existing cluster 
from C* 3 to C* 4), I wanted to share our strategy with you. It largely is what 
Scott already suggested, but I have some extra details, so I thought it might 
still be useful.

We duplicated our cluster using the strategy described at 
http://adamhutson.com/cloning-cassandra-clusters-the-fast-way/. Of course it is 
possible to figure out all the steps on your own, but I feel like this detailed 
guide saved me at least a few hours, if not days. Instead of restoring from a 
backup, we chose to create a snapshot on the live nodes and copy the data from 
there, but this does not really change the overall process.

We only run a single data-center cluster, but I think that this process easily 
translates to a multi data-center setup. In this case, you can choose to only 
clone a single data center or you can clone a few or all of them, if you deem 
this to be necessary for your tests. The only “limitation” is that for each 
data center that you clone, you need exactly the same number of nodes in your 
test cluster that you have in the respective data center of your production 
cluster.

Once the cluster is cloned, you can test whatever you like (e.g. upgrade to C* 
4, test operations in a mixed-version cluster, etc.).

Our experience with the upgrade from C* 3.11 to C* 4.1 on the test cluster was 
quite smooth. The only problem that we saw was that when later adding a second 
data center to the test cluster, we got a lot of CorruptSSTableExceptions on 
one of the nodes in the existing data center. We first attributed this to the 
upgrade, but later we found out that this also happens when running on C* 3.11.

We now believe that the hardware of one of the nodes that we used for the test 
cluster has a defect, because the exceptions were limited to this exact node, 
even after moving data around. It just took us a while to figure this out, 
because the hardware for the test cluster was brand new, so “broken hardware” 
wasn’t our first guess. We are still in the process of definitely proving that 
this specific piece of hardware is broken, but we are now sufficiently 
confident in the stability of C* 4, that we are soon going to move forward with 
upgrading the production cluster.

-Sebastian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to