Basically, we also have to think about how operable these changes will be for operators in multi-tenant, multi-cluster/dc environments w.r.t. key rotations, security, key deployments etc.
On Fri, Nov 19, 2021 at 8:03 PM Maulin Vasavada <[email protected]> wrote: > Hi all > > Really interesting discussion. I started reading this thread and still > have to catch-up a lot but based on my experience many big organizations > ultimately settle on having over-the-wire encryption combined with OS/disk > encryption to comply with the security requirements for various reasons > like, > 1. Potential performance challenges at high scale of data > movement/mirroring > 2. Internal security groups/zoning structures and restrictions (like > restrictions on key sharing between zones etc which makes > mirroring/replication for cross-zone impossible) > 3. Management/maintenance of the in-house Key Management System is quite a > challenging overhead for on-prem installations and when things move to > cloud, ultimately organizations opt-in for the cloud provider's on-disk > encryption and having over-the-wire encryption with TLS or using SASL over > SSL because the whole application migration/adoption becomes multi-year > challenging program. > > We experienced challenges even with AES-NI/JDK9+/Kernel TLS on Linux but > that was because we were looking at per-message (in Kafka world) encryption > with asymmetric envelope so it could be off the context here. > > None-the-less I will read the thread in more detail just to gain more > knowledge, it has been really a great technical discussion. > > Thanks > Maulin > > > > > On Fri, Nov 19, 2021 at 2:05 PM Kokoori, Shylaja < > [email protected]> wrote: > >> I agree with Joey, kernel also should be able to take advantage of the >> crypto acceleration. >> >> I also want to add, since performance of JDK is a concern here, newer >> Intel Icelake server platforms supports VAES and SHA-NI which further >> accelerates AES-GCM perf by 2x and SHA1 perf by ~6x using JDK 11. >> >> Some configuration information for the tests I ran. >> >> - JDK version used was JDK14 (should behave similarly with JDK11 >> also). >> - Since the tests were done before 4.0 GA'd, Cassandra version used >> was 4.0-beta3. Dataset size was ~500G >> - Workloads tested were 100% reads, 100% updates & 80:20 mix with >> cassandra-stress. I have not tested streaming yet. >> >> I would be happy to provide additional data points or make necessary code >> changes based on recommendations from folks here. >> >> Thanks, >> Shylaja >> >> -----Original Message----- >> From: Joshua McKenzie <[email protected]> >> Sent: Friday, November 19, 2021 4:53 AM >> To: [email protected] >> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption >> >> > >> > setting performance requirements on this regard is a nonsense. As long >> > as it's reasonably usable in real world, and Cassandra makes the >> > estimated effects on performance available, it will be up to the >> > operators to decide whether to turn on the feature >> >> I think Joey's argument, and correct me if I'm wrong, is that >> implementing a complex feature in Cassandra that we then have to manage >> that's essentially worse in every way compared to a built-in full-disk >> encryption option via LUKS+LVM etc is a poor use of our time and energy. >> >> i.e. we'd be better off investing our time into documenting how to do >> full disk encryption in a variety of scenarios + explaining why that is our >> recommended approach instead of taking the time and energy to design, >> implement, debug, and then maintain an inferior solution. >> >> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie <[email protected]> >> wrote: >> >> > Are you for real here? >> > >> > Please keep things cordial. Statements like this don't help move the >> > conversation along. >> > >> > >> > On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic < >> > [email protected]> wrote: >> > >> >> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch <[email protected]> >> wrote: >> >> > >> >> > On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja < >> >> [email protected]> >> >> > wrote: >> >> > >> >> > > To address Joey's concern, the OpenJDK JVM and its derivatives >> >> optimize >> >> > > Java crypto based on the underlying HW capabilities. For example, >> >> > > if >> >> the >> >> > > underlying HW supports AES-NI, JVM intrinsics will use those for >> >> crypto >> >> > > operations. Likewise, the new vector AES available on the latest >> >> > > Intel platform is utilized by the JVM while running on that >> >> > > platform to make crypto operations faster. >> >> > > >> >> > >> >> > Which JDK version were you running? We have had a number of issues >> >> > with >> >> the >> >> > JVM being 2-10x slower than native crypto on Java 8 (especially >> >> > MD5, >> >> SHA1, >> >> > and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). >> >> > Again >> >> I >> >> > think we could get the JVM crypto penalty down to ~2x native if we >> >> linked >> >> > in e.g. ACCP by default [1, 2] but even the very best Java crypto >> >> > I've >> >> seen >> >> > (fully utilizing hardware instructions) is still ~2x slower than >> >> > native code. The operating system has a number of advantages here >> >> > in that they don't pay JVM allocation costs or the JNI barrier (in >> >> > the case of ACCP) >> >> and >> >> > the kernel also takes advantage of hardware instructions. >> >> > >> >> > >> >> > > From our internal experiments, we see single digit % regression >> >> > > when transparent data encryption is enabled. >> >> > > >> >> > >> >> > Which workloads are you testing and how are you measuring the >> >> regression? I >> >> > suspect that compaction, repair (validation compaction), streaming, >> >> > and quorum reads are probably much slower (probably ~10x slower for >> >> > the throughput bound operations and ~2x slower on the read path). >> >> > As compaction/repair/streaming usually take up between 10-20% of >> >> > available >> >> CPU >> >> > cycles making them 2x slower might show up as <10% overall >> >> > utilization increase when you've really regressed 100% or more on >> >> > key metrics (compaction throughput, streaming throughput, memory >> >> > allocation rate, >> >> etc >> >> > ...). For example, if compaction was able to achieve 2 MiBps of >> >> throughput >> >> > before encryption and it was only able to achieve 1MiBps of >> >> > throughput afterwards, that would be a huge real world impact to >> >> > operators as compactions now take twice as long. >> >> > >> >> > I think a CEP or details on the ticket that indicate the >> >> > performance >> >> tests >> >> > and workloads that will be run might be wise? Perhaps something >> >> > like "encryption creates no more than a 1% regression of: >> >> > compaction >> >> throughput >> >> > (MiBps), streaming throughput (MiBps), repair validation throughput >> >> > (duration of full repair on the entire cluster), read throughput at >> >> > 10ms >> >> > p99 tail at quorum consistency (QPS handled while not exceeding P99 >> >> > SLO >> >> of >> >> > 10ms), etc ... while a sustained load is applied to a multi-node >> >> cluster"? >> >> >> >> Are you for real here?Nobody will ever guarantee you these %1 numbers >> >> ... come on. I think we are super paranoid about performance when we >> >> are not paranoid enough about security. This is a two way street. >> >> People are willing to give up on performance if security is a must. >> >> You do not need to use it if you do not want to, it is not like we >> >> are going to turn it on and you have to stick with that. Are you just >> >> saying that we are going to protect people from using some security >> >> features because their db might be slow? What if they just dont care? >> >> >> >> > Even a microbenchmark that just sees how long it takes to encrypt >> >> > and decrypt a 500MiB dataset using the proposed JVM implementation >> >> > versus encrypting it with a native implementation might be enough >> >> > to >> >> confirm/deny. >> >> > For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric >> >> > of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about >> >> > 1.6 >> >> GiBps >> >> > encryption and 1.0 GiBps decryption; from my past experiences with >> >> > Java crypto is it would achieve maybe 200 MiBps of >> _non-authenticated_ AES. >> >> > >> >> > Cheers, >> >> > -Joey >> >> > >> >> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15294 >> >> > [2] https://github.com/corretto/amazon-corretto-crypto-provider >> >> > [3] https://github.com/FiloSottile/age >> >> > [4] https://github.com/hashbrowncipher/keypipe#encryption >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> >> >> >
