Jeff, this is probably the best explanation and justification of the idea that I've heard so far.
I like it because 1) we really should have something official for backups 2) backups / object store would be great for analytics 3) it solves a much bigger problem than the single goal of moving instances. I'm a huge +1 in favor of this perspective, with live migration being one use case for backup / restore. Jon On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com> wrote: > I think Jordan and German had an interesting insight, or at least their > comment made me think about this slightly differently, and I’m going to > repeat it so it’s not lost in the discussion about zerocopy / sendfile. > > The CEP treats this as “move a live instance from one machine to another”. > I know why the author wants to do this. > > If you think of it instead as “change backup/restore mechanism to be able > to safely restore from a running instance”, you may end up with a cleaner > abstraction that’s easier to think about (and may also be easier to > generalize in clouds where you have other tools available ). > > I’m not familiar enough with the sidecar to know the state of > orchestration for backup/restore, but “ensure the original source node > isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe > “forcibly exclude the original instance from the cluster” are all things > the restore code is going to need to do anyway, and if restore doesn’t do > that today, it seems like we can solve it once. > > Backup probably needs to be generalized to support many sources, too. > Object storage is obvious (s3 download). Block storage is obvious (snapshot > and reattach). Reading sstables from another sidecar seems reasonable, too. > > It accomplishes the original goal, in largely the same fashion, it just > makes the logic reusable for other purposes? > > > > > > On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org> wrote: > > > On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws> wrote: > >> >> If there is a faster/better way to replace a node why not have Cassandra >> support that natively without the sidecar so people who aren’t running the >> sidecar can benefit? >> > > I am not the author of the CEP so take whatever I say with a pinch of > salt. Scott and Jordan have pointed out some benefits of doing this in the > Sidecar vs Cassandra. > > Today Cassandra is able to do fast node replacements. However, this CEP is > addressing an important corner case when Cassandra is unable to start up > due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die > on old hardware? Sure. However, you would still need operator intervention > to start it up in some special mode both on the old and new node so the new > node can peer with the old node, copy over its data and join the ring. This > would still require some orchestration outside the database. The Sidecar > can do that orchestration for the operator. The point I'm making here is > that the CEP addresses a real issue. The way it is currently built can > improve over time with improvements in Cassandra. > > Dinesh > >