On Wed, May 1, 2024 at 3:34 AM Alex Petrov <al...@coffeenco.de> wrote:

>
> We can implement CEP-40 using a similar approach: we can leave the source
> node as both a read and write target, and allow the new node to be a target
> for (pending) writes. Unfortunately, this does not help with availability
> (in fact, it decreases write availability, since we will have to collect
> 2+1 mandatory write responses instead of just 2), but increases durability,
> and I think helps to fully eliminate the second phase. This also increases
> read availability when the source node is up, since we can still use the
> source node as a part of read quorum.
>
>
I 100% agree that this is the more durable approach. And that bringing the
source node down reduces availability during the second phase. While my
inclination is that it would be better to implement the logic in the manner
you describe, from a pure correctness perspective, that loss of
availability of the r/w quorum is rare in my experience. Running a setup
like CEP-40 currently describes (but using S3 for the file transfer) for
over 3 years, in practice I have a hard time remembering one incident of
it. I'm sure its happened, but at the rate we replace hardware its not
something we deal with regularly despite taking the risk. I do agree as
well it needs to be well documented as surprising edge cases are never fun.
I think the existing and future TCM implementations cover the more
conservative/correct case and having this option as an alternative, or for
when the instance is unable to bring up the C* process, is a good to have.



> On Fri, Apr 5, 2024, at 12:46 PM, Venkata Hari Krishna Nukala wrote:
>
> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. Depending on the volume of the storage service
> load, replacements (repair + bootstrap) may take anywhere from a few hours
> to days.
>
> Proposing a Sidecar based solution to address these challenges. This
> solution proposes transferring data from the old host (source) to the new
> host (destination) and then bringing up the Cassandra process at the
> destination, to enable fast instance migration. This approach would help to
> minimise node downtime, as it is based on a Sidecar solution for data
> transfer and avoids repairs and bootstrap.
>
> Looking forward to the discussions.
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>
> Thanks!
> Hari
>
>
>

Reply via email to