Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Alex Petrov Wed, 01 May 2024 03:33:22 -0700

Thank you for submitting this CEP!

Wanted to discuss this point from the description:

> How to bring up/down Cassandra/Sidecar instances or making/applying config 
> changes are outside the scope of this document.

One advantage of doing migration via sidecar is the fact that we can stream 
sstables to the target node from the source node while the source node is down. 
Also if the source node is down, it does not matter if we can’t use it as a 
write target However, if we are replacing a live node, we do lose both 
durability and availability during the second copy phase. There are copious 
other advantages described by others in the thread above.

For example, we have three adjacent nodes A,B,C and simple RF 3. C (source) is 
up and is being replaced with live-migrated D (destination). According to the 
described process in CEP-40, we perform streaming in 2 phases: first one is a 
full copy (similar to bootstrap/replacement in cassandra), and the second one 
is just a diff. The second phase is still going to take a non-trivial amount of 
time, and is likely to last at very least minutes. During this time, we only 
have nodes A and B as both read and write targets, with no alternatives: we 
have to have both of them present for any operation, and losing either one of 
them leaves us with only one copy of data.

To contrast this, TCM bootstrap process is 4-step: between the old owner being 
phased out and the new owner brought in, we always ensure r/w quorum 
consistency and liveness of at least 2 nodes for the read quorum, 3 nodes 
available for reads in best case, and 2+1 pending replica for the write quorum, 
with 4 nodes (3 existing owners + 1 pending) being available for writes in best 
case. Replacement in TCM is implemented similarly, with the old node remaining 
an (unavailable) read target, but new node already being the target for 
(pending) writes.

We can implement CEP-40 using a similar approach: we can leave the source node 
as both a read and write target, and allow the new node to be a target for 
(pending) writes. Unfortunately, this does not help with availability (in fact, 
it decreases write availability, since we will have to collect 2+1 mandatory 
write responses instead of just 2), but increases durability, and I think helps 
to fully eliminate the second phase. This also increases read availability when 
the source node is up, since we can still use the source node as a part of read 
quorum.

I think if we want to call this feature "live migration", since this term is 
used in hypervisor community to describe an instant and uninterrupted instance 
migration from one host to the other without guest instance being able to 
notice as much as the time jump, we may want to provide similar guarantees. 

I am also not against to have this to be done post-factum, after implementation 
of CEP in its current form, but I think it would be good to have good 
understanding of availability and durability guarantees we want to provide with 
it, and have it stated explicitly, for both "source node down" and "source node 
up" cases. That said, since we will have to integrate CEP-40 with TCM, and will 
have to ensure correctness of sstable diffing for the second phase, it might 
make sense to consider reusing some of the existing replacement logic from TCM. 
Just to make sure this is mentioned explicitly, my proposal is only concerned 
with the second copy phase, without any implications about the first.

Thank you,
--Alex

On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
> Hi all,
> 
> I have filed CEP-40 [1] for live migrating Cassandra instances using the 
> Cassandra Sidecar.
> 
> When someone needs to move all or a portion of the Cassandra nodes belonging 
> to a cluster to different hosts, the traditional approach of Cassandra node 
> replacement can be time-consuming due to repairs and the bootstrapping of new 
> nodes. Depending on the volume of the storage service load, replacements 
> (repair + bootstrap) may take anywhere from a few hours to days.
> 
> Proposing a Sidecar based solution to address these challenges. This solution 
> proposes transferring data from the old host (source) to the new host 
> (destination) and then bringing up the Cassandra process at the destination, 
> to enable fast instance migration. This approach would help to minimise node 
> downtime, as it is based on a Sidecar solution for data transfer and avoids 
> repairs and bootstrap.
> 
> Looking forward to the discussions.
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
> 
> Thanks!
> Hari

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Reply via email to