I do really like the framing of replacing a node is restoring a node and
then kicking off a replace. That is effectively what we do internally.

I also agree we should be able to do data movement well both internal to
Cassandra and externally for a variety of reasons.

We’ve seen great performance with “ZCS+TLS” even though it’s not full zero
copy — nodes that previously took *days* to replace now take a few hours.
But we have seen it put pressure on nodes and drive up latencies which is
the main reason we still rely on an external data movement system by
default — falling back to ZCS+TLS as needed.

Jordan

On Fri, Apr 19, 2024 at 19:15 Jon Haddad <j...@jonhaddad.com> wrote:

> Jeff, this is probably the best explanation and justification of the idea
> that I've heard so far.
>
> I like it because
>
> 1) we really should have something official for backups
> 2) backups / object store would be great for analytics
> 3) it solves a much bigger problem than the single goal of moving
> instances.
>
> I'm a huge +1 in favor of this perspective, with live migration being one
> use case for backup / restore.
>
> Jon
>
>
> On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> I think Jordan and German had an interesting insight, or at least their
>> comment made me think about this slightly differently, and I’m going to
>> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>>
>> The CEP treats this as “move a live instance from one machine to
>> another”. I know why the author wants to do this.
>>
>> If you think of it instead as “change backup/restore mechanism to be able
>> to safely restore from a running instance”, you may end up with a cleaner
>> abstraction that’s easier to think about (and may also be easier to
>> generalize in clouds where you have other tools available ).
>>
>> I’m not familiar enough with the sidecar to know the state of
>> orchestration for backup/restore, but “ensure the original source node
>> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
>> “forcibly exclude the original instance from the cluster” are all things
>> the restore code is going to need to do anyway, and if restore doesn’t do
>> that today, it seems like we can solve it once.
>>
>> Backup probably needs to be generalized to support many sources, too.
>> Object storage is obvious (s3 download). Block storage is obvious (snapshot
>> and reattach). Reading sstables from another sidecar seems reasonable, too.
>>
>> It accomplishes the original goal, in largely the same fashion, it just
>> makes the logic reusable for other purposes?
>>
>>
>>
>>
>>
>> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi <djo...@apache.org> wrote:
>>
>> 
>> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg <ar...@weisberg.ws>
>> wrote:
>>
>>>
>>> If there is a faster/better way to replace a node why not  have
>>> Cassandra support that natively without the sidecar so people who aren’t
>>> running the sidecar can benefit?
>>>
>>
>> I am not the author of the CEP so take whatever I say with a pinch of
>> salt. Scott and Jordan have pointed out some benefits of doing this in the
>> Sidecar vs Cassandra.
>>
>> Today Cassandra is able to do fast node replacements. However, this CEP
>> is addressing an important corner case when Cassandra is unable to start up
>> due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
>> on old hardware? Sure. However, you would still need operator intervention
>> to start it up in some special mode both on the old and new node so the new
>> node can peer with the old node, copy over its data and join the ring. This
>> would still require some orchestration outside the database. The Sidecar
>> can do that orchestration for the operator. The point I'm making here is
>> that the CEP addresses a real issue. The way it is currently built can
>> improve over time with improvements in Cassandra.
>>
>> Dinesh
>>
>>

Reply via email to