Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Venkata Hari Krishna Nukala Thu, 11 Apr 2024 10:01:40 -0700

Thanks Jon & Scott for taking time to go through this CEP and providing
inputs.


I am completely with what Scott had mentioned earlier (I would have added
more details into the CEP). Adding a few more points to the same.

Having a solution with Sidecar can make the migration easy without
depending on rsync. At least in the cases I have seen, rsync is not enabled
by default and most of them want to run OS/images with as minimal
requirements as possible. Installing rsync requires admin privileges and
syncing data is a manual operation. If an API is provided with Sidecar,
then tooling can be built around it reducing the scope for manual errors.

>From performance wise, at least in the cases I had seen, the File Streaming
API in Sidecar performs a lot better. To give an idea on the performance, I
would like to quote "up to 7 Gbps/instance writes (depending on hardware)"
from CEP-28 as this CEP proposes to leverage the same.

For:

>When enabled for LCS, single sstable uplevel will mutate only the level of
an SSTable in its stats metadata component, which wouldn't alter the
filename and may not alter the length of the stats metadata component. A
change to the level of an SSTable on the source via single sstable uplevel
may not be caught by a digest based only on filename and length.

In this case file size may not change, but the timestamp of last modified
time would change, right? It is addressed in section MIGRATING ONE
INSTANCE, point 2.b.ii which says "If a file is present at the destination
but did not match (by size or timestamp) with the source file, then local
file is deleted and added to list of files to download.". And after
download by final data copy task, file should match with source.

On Thu, Apr 11, 2024 at 7:30 AM C. Scott Andreas <[email protected]>
wrote:

> Oh, one note on this item:
>
> >  The operator can ensure that files in the destination matches with the
> source. In the first iteration of this feature, an API is introduced to
> calculate digest for the list of file names and their lengths to identify
> any mismatches. It does not validate the file contents at the binary level,
> but, such feature can be added at a later point of time.
>
> When enabled for LCS, single sstable uplevel will mutate only the level of
> an SSTable in its stats metadata component, which wouldn't alter the
> filename and may not alter the length of the stats metadata component. A
> change to the level of an SSTable on the source via single sstable uplevel
> may not be caught by a digest based only on filename and length.
>
> Including the file’s modification timestamp would address this without
> requiring a deep hash of the data. This would be good to include to ensure
> SSTables aren’t downleveled unexpectedly during migration.
>
> - Scott
>
> On Apr 8, 2024, at 2:15 PM, C. Scott Andreas <[email protected]> wrote:
>
> 
> Hi Jon,
>
> Thanks for taking the time to read and reply to this proposal. Would
> encourage you to approach it from an attitude of seeking understanding on
> the part of the first-time CEP author, as this reply casts it off pretty
> quickly as NIH.
>
> The proposal isn't mine, but I'll offer a few notes on where I see this as
> valuable:
>
> – It's valuable for Cassandra to have an ecosystem-native mechanism of
> migrating data between physical/virtual instances outside the standard
> streaming path. As Hari mentions, the current ecosystem-native approach of
> executing repairs, decommissions, and bootstraps is time-consuming and
> cumbersome.
>
> – An ecosystem-native solution is safer than a bunch of bash and rsync.
> Defining a safe protocol to migrate data between instances via rsync
> without downtime is surprisingly difficult - and even moreso to do safely
> and repeatedly at scale. Enabling this process to be orchestrated by a
> control plane mechanizing offical endpoints of the database and sidecar –
> rather than trying to move data around behind its back – is much safer than
> hoping one's cobbled together the right set of scripts to move data in a
> way that won't violate strong / transactional consistency guarantees. This
> complexity is kind of exemplified by the "Migrating One Instance" section
> of the doc and state machine diagram, which illustrates an approach to
> solving that problem.
>
> – An ecosystem-native approach poses fewer security concerns than rsync.
> mTLS-authenticated endpoints in the sidecar for data movement eliminate the
> requirement for orchestration to occur via (typically) high-privilege SSH,
> which often allows for code execution of some form or complex efforts to
> scope SSH privileges of particular users; and eliminates the need to manage
> and secure rsyncd processes on each instance if not via SSH.
>
> – An ecosystem-native approach is more instrumentable and measurable than
> rsync. Support for data migration endpoints in the sidecar would allow for
> metrics reporting, stats collection, and alerting via mature and modern
> mechanisms rather than monitoring the output of a shell script.
>
> I'll yield to Hari to share more, though today is a public holiday in
> India.
>
> I do see this CEP as solving an important problem.
>
> Thanks,
>
> – Scott
>
> On Apr 8, 2024, at 10:23 AM, Jon Haddad <[email protected]> wrote:
>
>
> This seems like a lot of work to create an rsync alternative.  I can't
> really say I see the point.  I noticed your "rejected alternatives"
> mentions it with this note:
>
>
>    - However, it might not be permitted by the administrator or available
>    in various environments such as Kubernetes or virtual instances like EC2.
>    Enabling data transfer through a sidecar facilitates smooth instance
>    migration.
>
> This feels more like NIH than solving a real problem, as what you've
> listed is a hypothetical, and one that's easily addressed.
>
> Jon
>
>
>
> On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala <
> [email protected]> wrote:
>
>> Hi all,
>>
>> I have filed CEP-40 [1] for live migrating Cassandra instances using the
>> Cassandra Sidecar.
>>
>> When someone needs to move all or a portion of the Cassandra nodes
>> belonging to a cluster to different hosts, the traditional approach of
>> Cassandra node replacement can be time-consuming due to repairs and the
>> bootstrapping of new nodes. Depending on the volume of the storage service
>> load, replacements (repair + bootstrap) may take anywhere from a few hours
>> to days.
>>
>> Proposing a Sidecar based solution to address these challenges. This
>> solution proposes transferring data from the old host (source) to the new
>> host (destination) and then bringing up the Cassandra process at the
>> destination, to enable fast instance migration. This approach would help to
>> minimise node downtime, as it is based on a Sidecar solution for data
>> transfer and avoids repairs and bootstrap.
>>
>> Looking forward to the discussions.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>>
>> Thanks!
>> Hari
>>
>
>

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

Reply via email to