I haven't looked at streaming over TLS, so I might be way off base here,
but our own docs (
https://cassandra.apache.org/doc/latest/cassandra/architecture/streaming.html)
say ZCS is not available when using encryption, and if we have to bring the
data into the JVM then I'm not sure how it would even work.  sendfile is a
direct file descriptor to file descriptor copy.  How are we simultaneously
doing kernel-only operations while also performing encryption in the JVM?

I'm assuming you mean something other than ZCS when you say "ZCS with
TLS"?  Maybe "no serde" streaming?

Jon




On Fri, Apr 19, 2024 at 2:36 PM C. Scott Andreas <sc...@paradoxica.net>
wrote:

> These are the salient points here for me, yes:
>
> > My understanding from the proposal is that Sidecar would be able to
> migrate from a Cassandra instance that is already dead and cannot recover.
>
> > That’s one thing I like about having it an external process — not that
> it’s bullet proof but it’s one less thing to worry about.
>
> The manual/rsync version of the state machine Hari describes in the CEP is
> one of the best escape hatches for migrating an instance that’s
> overstressed, limping on ailing hardware, or that has exhausted disk. If
> the system is functional but the C* process is in bad shape, it’s great to
> have a paved-path flow for migrating the instance and data to more capable
> hardware.
>
> I also agree in principle that “streaming should be just as fast via the
> C* process itself.” This hits a couple snags today:
>
> - This option isn’t available when the C* instance is struggling.
> - In the scenario of replacing an entire cluster’s hardware with new
> machines, applying this process to an entire cluster via host replacements
> of all instances (which also requires repairs) or by doubling then halving
> capacity is incredibly cumbersome and operationally-impacting to the
> database’s users - especially if the DB is already having a hard time.
> - The host replacement process also puts a lot of stress on gossip and is
> a great way to encounter all sorts of painful races if you perform it
> hundreds or thousands of times (but shouldn’t be a problem in TCM-world).
>
> So I think I agree with both points:
>
> - Cassandra should be able to do this itself.
> - It is also valuable to have a paved path implementation of a safe
> migration/forklift state machine when you’re in a bind, or need to do this
> hundreds or thousands of times.
>
> On zero copy: what really makes ZCS fast compared to legacy streaming is
> that the JVM is able to ship entire files around, rather than deserializing
> SSTables and reserializing them to stream each individual row. That’s the
> slow and expensive part. It’s true that TLS means you incur an extra memcpy
> as that stream is encrypted before it’s chunked into packets — but the cost
> of that memcpy for encryption pales in comparison to how slow
> deserializing/reserializing SSTables is/was.
>
> ZCS with TLS can push 20Gbps+ today on decent but not extravagant Xeon
> hardware. In-kernel TLS would also still encounter a memcpy in the
> encryption path; the kernel.org doc alludes to this via “the kernel will
> need to allocate a buffer for the encrypted data.” But it would allow using
> sendfile and cut a copy in userspace. If someone is interested in testing
> it out I’d love to learn what they find. It’s always a great surprise to
> learn there’s a more perf left on the table. This comparison looks
> promising: https://tinselcity.github.io/SSL_Sendfile/
>
> – Scott
>
> —
> Mobile
>
> On Apr 19, 2024, at 11:31 AM, Jordan West <jorda...@gmail.com> wrote:
>
> 
> If we are considering the main process then we have to do some additional
> work to ensure that it doesn’t put pressure on the JVM and introduce
> latency. That’s one thing I like about having it an external process — not
> that it’s bullet proof but it’s one less thing to worry about.
>
> Jordan
>
> On Thu, Apr 18, 2024 at 15:39 Francisco Guerrero <fran...@apache.org>
> wrote:
>
>> My understanding from the proposal is that Sidecar would be able to
>> migrate
>> from a Cassandra instance that is already dead and cannot recover. This
>> is a
>> scenario that is possible where Sidecar should still be able to migrate
>> to a new
>> instance.
>>
>> Alternatively, Cassandra itself could have some flag to start up with
>> limited
>> subsystems enabled to allow live migration.
>>
>> In any case, we'll need to weigh in the pros and cons of each alternative
>> and
>> decide if the live migration process can be handled within the C* process
>> itself
>> or if we allow this functionality to be handled by Sidecar.
>>
>> I am looking forward to this feature though, as it will be of great value
>> for many
>> users across the ecosystem.
>>
>> On 2024/04/18 22:25:23 Jon Haddad wrote:
>> > Hmm... I guess if you're using encryption you can't use ZCS so there's
>> that.
>> >
>> > It probably makes sense to implement kernel TLS:
>> > https://www.kernel.org/doc/html/v5.7/networking/tls.html
>> >
>> > Then we can get ZCS all the time, for bootstrap & replacements.
>> >
>> > Jon
>> >
>> >
>> > On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad <j...@jonhaddad.com> wrote:
>> >
>> > > Ariel, having it in C* process makes sense to me.
>> > >
>> > > Please correct me if I'm wrong here, but shouldn't using ZCS to
>> transfer
>> > > have no distinguishable difference in overhead from doing it using the
>> > > sidecar?  Since the underlying call is sendfile, never hitting
>> userspace, I
>> > > can't see why we'd opt for the transfer in sidecar.  What's the
>> > > advantage of duplicating the work that's already been done?
>> > >
>> > > I can see using the sidecar for coordination to start and stop
>> instances
>> > > or do things that require something out of process.
>> > >
>> > > Jon
>> > >
>> > >
>> > > On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg <ar...@weisberg.ws>
>> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> If there is a faster/better way to replace a node why not  have
>> Cassandra
>> > >> support that natively without the sidecar so people who aren’t
>> running the
>> > >> sidecar can benefit?
>> > >>
>> > >> Copying files over a network shouldn’t be slow in C* and it would
>> also
>> > >> already have all the connectivity issues solved.
>> > >>
>> > >> Regards,
>> > >> Ariel
>> > >>
>> > >> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
>> > >>
>> > >> Hi all,
>> > >>
>> > >> I have filed CEP-40 [1] for live migrating Cassandra instances using
>> the
>> > >> Cassandra Sidecar.
>> > >>
>> > >> When someone needs to move all or a portion of the Cassandra nodes
>> > >> belonging to a cluster to different hosts, the traditional approach
>> of
>> > >> Cassandra node replacement can be time-consuming due to repairs and
>> the
>> > >> bootstrapping of new nodes. Depending on the volume of the storage
>> service
>> > >> load, replacements (repair + bootstrap) may take anywhere from a few
>> hours
>> > >> to days.
>> > >>
>> > >> Proposing a Sidecar based solution to address these challenges. This
>> > >> solution proposes transferring data from the old host (source) to
>> the new
>> > >> host (destination) and then bringing up the Cassandra process at the
>> > >> destination, to enable fast instance migration. This approach would
>> help to
>> > >> minimise node downtime, as it is based on a Sidecar solution for data
>> > >> transfer and avoids repairs and bootstrap.
>> > >>
>> > >> Looking forward to the discussions.
>> > >>
>> > >> [1]
>> > >>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>> > >>
>> > >> Thanks!
>> > >> Hari
>> > >>
>> > >>
>> > >>
>> >
>>
>

Reply via email to