We use backup/restore for our implementation of this concept. It has the added benefit that the backup / restore path gets exercised much more regularly than it would in normal operations, finding edge case bugs at a time when you still have other ways of recovering rather than in a full disaster scenario.
Cheers Ben From: Jordan West <jorda...@gmail.com> Date: Sunday, 21 April 2024 at 05:38 To: dev@cassandra.apache.org <dev@cassandra.apache.org> Subject: Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances EXTERNAL EMAIL - USE CAUTION when clicking links or attachments I do really like the framing of replacing a node is restoring a node and then kicking off a replace. That is effectively what we do internally. I also agree we should be able to do data movement well both internal to Cassandra and externally for a variety of reasons. We’ve seen great performance with “ZCS+TLS” even though it’s not full zero copy — nodes that previously took *days* to replace now take a few hours. But we have seen it put pressure on nodes and drive up latencies which is the main reason we still rely on an external data movement system by default — falling back to ZCS+TLS as needed. Jordan On Fri, Apr 19, 2024 at 19:15 Jon Haddad <j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote: Jeff, this is probably the best explanation and justification of the idea that I've heard so far. I like it because 1) we really should have something official for backups 2) backups / object store would be great for analytics 3) it solves a much bigger problem than the single goal of moving instances. I'm a huge +1 in favor of this perspective, with live migration being one use case for backup / restore. Jon On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com<mailto:jji...@gmail.com>> wrote: I think Jordan and German had an interesting insight, or at least their comment made me think about this slightly differently, and I’m going to repeat it so it’s not lost in the discussion about zerocopy / sendfile. The CEP treats this as “move a live instance from one machine to another”. I know why the author wants to do this. If you think of it instead as “change backup/restore mechanism to be able to safely restore from a running instance”, you may end up with a cleaner abstraction that’s easier to think about (and may also be easier to generalize in clouds where you have other tools available ). I’m not familiar enough with the sidecar to know the state of orchestration for backup/restore, but “ensure the original source node isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe “forcibly exclude the original instance from the cluster” are all things the restore code is going to need to do anyway, and if restore doesn’t do that today, it seems like we can solve it once. Backup probably needs to be generalized to support many sources, too. Object storage is obvious (s3 download). Block storage is obvious (snapshot and reattach). Reading sstables from another sidecar seems reasonable, too. It accomplishes the original goal, in largely the same fashion, it just makes the logic reusable for other purposes? On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org<mailto:djo...@apache.org>> wrote: On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws<mailto:ar...@weisberg.ws>> wrote: If there is a faster/better way to replace a node why not have Cassandra support that natively without the sidecar so people who aren’t running the sidecar can benefit? I am not the author of the CEP so take whatever I say with a pinch of salt. Scott and Jordan have pointed out some benefits of doing this in the Sidecar vs Cassandra. Today Cassandra is able to do fast node replacements. However, this CEP is addressing an important corner case when Cassandra is unable to start up due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die on old hardware? Sure. However, you would still need operator intervention to start it up in some special mode both on the old and new node so the new node can peer with the old node, copy over its data and join the ring. This would still require some orchestration outside the database. The Sidecar can do that orchestration for the operator. The point I'm making here is that the CEP addresses a real issue. The way it is currently built can improve over time with improvements in Cassandra. Dinesh