Thank you for input! 

> Would it be possible to create a new type of write target node?  The new 
> write target node is notified of writes (like any other write node) but does 
> not participate in the write availability calculation. 

We could make a some kind of optional write, but unfortunately this way we can 
not codify our consistency level. Since we already use a notion of pending 
ranges that requires 1 extra ack, and we as a community are OK with it, I think 
for simplicity we should stick to the same notion. 

If there is a lot of interest in this kind of availability/durability tradeoff, 
we should discuss all implications in a separate CEP, but then it probably 
would make sense to make it available for all operations. 

My personal opinion is that if we can't guarantee/rely on the number of acks, 
this may accidentally mislead people as they would expect it to work and lead 
to surprises when it does not.

On Wed, May 1, 2024, at 4:38 PM, Claude Warren, Jr via dev wrote:
> Alex,
> 
>  you write:
>> We can implement CEP-40 using a similar approach: we can leave the source 
>> node as both a read and write target, and allow the new node to be a target 
>> for (pending) writes. Unfortunately, this does not help with availability 
>> (in fact, it decreases write availability, since we will have to collect 2+1 
>> mandatory write responses instead of just 2), but increases durability, and 
>> I think helps to fully eliminate the second phase. This also increases read 
>> availability when the source node is up, since we can still use the source 
>> node as a part of read quorum.
> 
> Would it be possible to create a new type of write target node?  The new 
> write target node is notified of writes (like any other write node) but does 
> not participate in the write availability calculation.  In this way a node 
> this is being migrated to could receive writes and have minimal impact on the 
> current operation of the cluster?
> 
> Claude
> 
> 
> 
> On Wed, May 1, 2024 at 12:33 PM Alex Petrov <al...@coffeenco.de> wrote:
>> __
>> Thank you for submitting this CEP!
>> 
>> Wanted to discuss this point from the description:
>> 
>> > How to bring up/down Cassandra/Sidecar instances or making/applying config 
>> > changes are outside the scope of this document.
>> 
>> One advantage of doing migration via sidecar is the fact that we can stream 
>> sstables to the target node from the source node while the source node is 
>> down. Also if the source node is down, it does not matter if we can’t use it 
>> as a write target However, if we are replacing a live node, we do lose both 
>> durability and availability during the second copy phase. There are copious 
>> other advantages described by others in the thread above.
>> 
>> For example, we have three adjacent nodes A,B,C and simple RF 3. C (source) 
>> is up and is being replaced with live-migrated D (destination). According to 
>> the described process in CEP-40, we perform streaming in 2 phases: first one 
>> is a full copy (similar to bootstrap/replacement in cassandra), and the 
>> second one is just a diff. The second phase is still going to take a 
>> non-trivial amount of time, and is likely to last at very least minutes. 
>> During this time, we only have nodes A and B as both read and write targets, 
>> with no alternatives: we have to have both of them present for any 
>> operation, and losing either one of them leaves us with only one copy of 
>> data.
>> 
>> To contrast this, TCM bootstrap process is 4-step: between the old owner 
>> being phased out and the new owner brought in, we always ensure r/w quorum 
>> consistency and liveness of at least 2 nodes for the read quorum, 3 nodes 
>> available for reads in best case, and 2+1 pending replica for the write 
>> quorum, with 4 nodes (3 existing owners + 1 pending) being available for 
>> writes in best case. Replacement in TCM is implemented similarly, with the 
>> old node remaining an (unavailable) read target, but new node already being 
>> the target for (pending) writes.
>> 
>> We can implement CEP-40 using a similar approach: we can leave the source 
>> node as both a read and write target, and allow the new node to be a target 
>> for (pending) writes. Unfortunately, this does not help with availability 
>> (in fact, it decreases write availability, since we will have to collect 2+1 
>> mandatory write responses instead of just 2), but increases durability, and 
>> I think helps to fully eliminate the second phase. This also increases read 
>> availability when the source node is up, since we can still use the source 
>> node as a part of read quorum.
>> 
>> I think if we want to call this feature "live migration", since this term is 
>> used in hypervisor community to describe an instant and uninterrupted 
>> instance migration from one host to the other without guest instance being 
>> able to notice as much as the time jump, we may want to provide similar 
>> guarantees. 
>> 
>> I am also not against to have this to be done post-factum, after 
>> implementation of CEP in its current form, but I think it would be good to 
>> have good understanding of availability and durability guarantees we want to 
>> provide with it, and have it stated explicitly, for both "source node down" 
>> and "source node up" cases. That said, since we will have to integrate 
>> CEP-40 with TCM, and will have to ensure correctness of sstable diffing for 
>> the second phase, it might make sense to consider reusing some of the 
>> existing replacement logic from TCM. Just to make sure this is mentioned 
>> explicitly, my proposal is only concerned with the second copy phase, 
>> without any implications about the first.
>> 
>> Thank you,
>> --Alex
>> 
>> On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
>>> Hi all,
>>> 
>>> I have filed CEP-40 [1] for live migrating Cassandra instances using the 
>>> Cassandra Sidecar.
>>> 
>>> When someone needs to move all or a portion of the Cassandra nodes 
>>> belonging to a cluster to different hosts, the traditional approach of 
>>> Cassandra node replacement can be time-consuming due to repairs and the 
>>> bootstrapping of new nodes. Depending on the volume of the storage service 
>>> load, replacements (repair + bootstrap) may take anywhere from a few hours 
>>> to days.
>>> 
>>> Proposing a Sidecar based solution to address these challenges. This 
>>> solution proposes transferring data from the old host (source) to the new 
>>> host (destination) and then bringing up the Cassandra process at the 
>>> destination, to enable fast instance migration. This approach would help to 
>>> minimise node downtime, as it is based on a Sidecar solution for data 
>>> transfer and avoids repairs and bootstrap.
>>> 
>>> Looking forward to the discussions.
>>> 
>>> [1] 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>>> 
>>> Thanks!
>>> Hari
>> 

Reply via email to