Basically, we also have to think about how operable these changes will be
for operators in multi-tenant, multi-cluster/dc environments w.r.t. key
rotations, security, key deployments etc.

On Fri, Nov 19, 2021 at 8:03 PM Maulin Vasavada <maulin.vasav...@gmail.com>
wrote:

> Hi all
>
> Really interesting discussion. I started reading this thread and still
> have to catch-up a lot but based on my experience many big organizations
> ultimately settle on having over-the-wire encryption combined with OS/disk
> encryption to comply with the security requirements for various reasons
> like,
> 1. Potential performance challenges at high scale of data
> movement/mirroring
> 2. Internal security groups/zoning structures and restrictions (like
> restrictions on key sharing between zones etc which makes
> mirroring/replication for cross-zone impossible)
> 3. Management/maintenance of the in-house Key Management System is quite a
> challenging overhead for on-prem installations and when things move to
> cloud, ultimately organizations opt-in for the cloud provider's on-disk
> encryption and having over-the-wire encryption with TLS or using SASL over
> SSL because the whole application migration/adoption becomes multi-year
> challenging program.
>
> We experienced challenges even with AES-NI/JDK9+/Kernel TLS on Linux but
> that was because we were looking at per-message (in Kafka world) encryption
> with asymmetric envelope so it could be off the context here.
>
> None-the-less I will read the thread in more detail just to gain more
> knowledge, it has been really a great technical discussion.
>
> Thanks
> Maulin
>
>
>
>
> On Fri, Nov 19, 2021 at 2:05 PM Kokoori, Shylaja <
> shylaja.koko...@intel.com> wrote:
>
>> I agree with Joey, kernel also should be able to take advantage of the
>> crypto acceleration.
>>
>> I also want to add, since performance of JDK is a concern here, newer
>> Intel Icelake server platforms supports VAES and SHA-NI which further
>> accelerates AES-GCM perf by 2x and SHA1 perf by ~6x using JDK 11.
>>
>> Some configuration information for the tests I ran.
>>
>>     - JDK version used was JDK14 (should behave similarly with JDK11
>> also).
>>     - Since the tests were done before 4.0 GA'd, Cassandra version used
>> was 4.0-beta3. Dataset size was ~500G
>>     - Workloads tested were 100% reads, 100% updates & 80:20 mix with
>> cassandra-stress. I have not tested streaming yet.
>>
>> I would be happy to provide additional data points or make necessary code
>> changes based on recommendations from folks here.
>>
>> Thanks,
>> Shylaja
>>
>> -----Original Message-----
>> From: Joshua McKenzie <jmcken...@apache.org>
>> Sent: Friday, November 19, 2021 4:53 AM
>> To: dev@cassandra.apache.org
>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
>>
>> >
>> > setting performance requirements on this regard is a nonsense. As long
>> > as it's reasonably usable in real world, and Cassandra makes the
>> > estimated effects on performance available, it will be up to the
>> > operators to decide whether to turn on the feature
>>
>> I think Joey's argument, and correct me if I'm wrong, is that
>> implementing a complex feature in Cassandra that we then have to manage
>> that's essentially worse in every way compared to a built-in full-disk
>> encryption option via LUKS+LVM etc is a poor use of our time and energy.
>>
>> i.e. we'd be better off investing our time into documenting how to do
>> full disk encryption in a variety of scenarios + explaining why that is our
>> recommended approach instead of taking the time and energy to design,
>> implement, debug, and then maintain an inferior solution.
>>
>> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <jmcken...@apache.org>
>> wrote:
>>
>> > Are you for real here?
>> >
>> > Please keep things cordial. Statements like this don't help move the
>> > conversation along.
>> >
>> >
>> > On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic <
>> > stefan.mikloso...@instaclustr.com> wrote:
>> >
>> >> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <joe.e.ly...@gmail.com>
>> wrote:
>> >> >
>> >> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <
>> >> shylaja.koko...@intel.com>
>> >> > wrote:
>> >> >
>> >> > > To address Joey's concern, the OpenJDK JVM and its derivatives
>> >> optimize
>> >> > > Java crypto based on the underlying HW capabilities. For example,
>> >> > > if
>> >> the
>> >> > > underlying HW supports AES-NI, JVM intrinsics will use those for
>> >> crypto
>> >> > > operations. Likewise, the new vector AES available on the latest
>> >> > > Intel platform is utilized by the JVM while running on that
>> >> > > platform to make crypto operations faster.
>> >> > >
>> >> >
>> >> > Which JDK version were you running? We have had a number of issues
>> >> > with
>> >> the
>> >> > JVM being 2-10x slower than native crypto on Java 8 (especially
>> >> > MD5,
>> >> SHA1,
>> >> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower).
>> >> > Again
>> >> I
>> >> > think we could get the JVM crypto penalty down to ~2x native if we
>> >> linked
>> >> > in e.g. ACCP by default [1, 2] but even the very best Java crypto
>> >> > I've
>> >> seen
>> >> > (fully utilizing hardware instructions) is still ~2x slower than
>> >> > native code. The operating system has a number of advantages here
>> >> > in that they don't pay JVM allocation costs or the JNI barrier (in
>> >> > the case of ACCP)
>> >> and
>> >> > the kernel also takes advantage of hardware instructions.
>> >> >
>> >> >
>> >> > > From our internal experiments, we see single digit % regression
>> >> > > when transparent data encryption is enabled.
>> >> > >
>> >> >
>> >> > Which workloads are you testing and how are you measuring the
>> >> regression? I
>> >> > suspect that compaction, repair (validation compaction), streaming,
>> >> > and quorum reads are probably much slower (probably ~10x slower for
>> >> > the throughput bound operations and ~2x slower on the read path).
>> >> > As compaction/repair/streaming usually take up between 10-20% of
>> >> > available
>> >> CPU
>> >> > cycles making them 2x slower might show up as <10% overall
>> >> > utilization increase when you've really regressed 100% or more on
>> >> > key metrics (compaction throughput, streaming throughput, memory
>> >> > allocation rate,
>> >> etc
>> >> > ...). For example, if compaction was able to achieve 2 MiBps of
>> >> throughput
>> >> > before encryption and it was only able to achieve 1MiBps of
>> >> > throughput afterwards, that would be a huge real world impact to
>> >> > operators as compactions now take twice as long.
>> >> >
>> >> > I think a CEP or details on the ticket that indicate the
>> >> > performance
>> >> tests
>> >> > and workloads that will be run might be wise? Perhaps something
>> >> > like "encryption creates no more than a 1% regression of:
>> >> > compaction
>> >> throughput
>> >> > (MiBps), streaming throughput (MiBps), repair validation throughput
>> >> > (duration of full repair on the entire cluster), read throughput at
>> >> > 10ms
>> >> > p99 tail at quorum consistency (QPS handled while not exceeding P99
>> >> > SLO
>> >> of
>> >> > 10ms), etc ... while a sustained load is applied to a multi-node
>> >> cluster"?
>> >>
>> >> Are you for real here?Nobody will ever guarantee you these %1 numbers
>> >> ... come on. I think we are super paranoid about performance when we
>> >> are not paranoid enough about security. This is a two way street.
>> >> People are willing to give up on performance if security is a must.
>> >> You do not need to use it if you do not want to, it is not like we
>> >> are going to turn it on and you have to stick with that. Are you just
>> >> saying that we are going to protect people from using some security
>> >> features because their db might be slow? What if they just dont care?
>> >>
>> >> > Even a microbenchmark that just sees how long it takes to encrypt
>> >> > and decrypt a 500MiB dataset using the proposed JVM implementation
>> >> > versus encrypting it with a native implementation might be enough
>> >> > to
>> >> confirm/deny.
>> >> > For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric
>> >> > of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about
>> >> > 1.6
>> >> GiBps
>> >> > encryption and 1.0 GiBps decryption; from my past experiences with
>> >> > Java crypto is it would achieve maybe 200 MiBps of
>> _non-authenticated_ AES.
>> >> >
>> >> > Cheers,
>> >> > -Joey
>> >> >
>> >> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15294
>> >> > [2] https://github.com/corretto/amazon-corretto-crypto-provider
>> >> > [3] https://github.com/FiloSottile/age
>> >> > [4] https://github.com/hashbrowncipher/keypipe#encryption
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> >>
>> >>
>>
>

Reply via email to