Jeff, this is probably the best explanation and justification of the idea
that I've heard so far.

I like it because

1) we really should have something official for backups
2) backups / object store would be great for analytics
3) it solves a much bigger problem than the single goal of moving instances.

I'm a huge +1 in favor of this perspective, with live migration being one
use case for backup / restore.

Jon


On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com> wrote:

> I think Jordan and German had an interesting insight, or at least their
> comment made me think about this slightly differently, and I’m going to
> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>
> The CEP treats this as “move a live instance from one machine to another”.
> I know why the author wants to do this.
>
> If you think of it instead as “change backup/restore mechanism to be able
> to safely restore from a running instance”, you may end up with a cleaner
> abstraction that’s easier to think about (and may also be easier to
> generalize in clouds where you have other tools available ).
>
> I’m not familiar enough with the sidecar to know the state of
> orchestration for backup/restore, but “ensure the original source node
> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
> “forcibly exclude the original instance from the cluster” are all things
> the restore code is going to need to do anyway, and if restore doesn’t do
> that today, it seems like we can solve it once.
>
> Backup probably needs to be generalized to support many sources, too.
> Object storage is obvious (s3 download). Block storage is obvious (snapshot
> and reattach). Reading sstables from another sidecar seems reasonable, too.
>
> It accomplishes the original goal, in largely the same fashion, it just
> makes the logic reusable for other purposes?
>
>
>
>
>
> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org> wrote:
>
> 
> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
>
>>
>> If there is a faster/better way to replace a node why not  have Cassandra
>> support that natively without the sidecar so people who aren’t running the
>> sidecar can benefit?
>>
>
> I am not the author of the CEP so take whatever I say with a pinch of
> salt. Scott and Jordan have pointed out some benefits of doing this in the
> Sidecar vs Cassandra.
>
> Today Cassandra is able to do fast node replacements. However, this CEP is
> addressing an important corner case when Cassandra is unable to start up
> due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
> on old hardware? Sure. However, you would still need operator intervention
> to start it up in some special mode both on the old and new node so the new
> node can peer with the old node, copy over its data and join the ring. This
> would still require some orchestration outside the database. The Sidecar
> can do that orchestration for the operator. The point I'm making here is
> that the CEP addresses a real issue. The way it is currently built can
> improve over time with improvements in Cassandra.
>
> Dinesh
>
>

Reply via email to