https://bugs.openjdk.java.net/browse/JDK-7184394 added AES intrinsics in Java 8, in 2012. While it's always possible to have a regression, and it's important to understand the performance impact, stories of 2-10x sound apocryphal. If they're all using the same intrinsics, the performance should be roughly the same. I think that the real challenge will be key management, not performance.
Derek On Fri, Nov 19, 2021 at 7:41 AM Bowen Song <bo...@bso.ng.invalid> wrote: > On the performance note, I copy & pasted a small piece of Java code to > do AES256-CBC on the stdin and write the result to stdout. I then ran > the following two commands on the same machine (with AES-NI) for > comparison: > > $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time > /usr/lib/jvm/java-11-openjdk/bin/java -jar aes-bench.jar >/dev/null > 36.24s user 5.96s system 100% cpu 41.912 total > $ dd if=/dev/zero bs=4096 count=$((4*1024*1024)) status=none | time > openssl enc -aes-256-cbc -e -K > "0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef" > -iv "0123456789abcdef0123456789abcdef" >/dev/null > 31.09s user 3.92s system 99% cpu 35.043 total > > This is not an accurate test of the AES performance, as the Java test > includes the JVM start up time and the key and IV generation in the Java > code. But this gives us a pretty good idea that the total performance > regression is definitely far from the 2x to 10x slower claimed in some > previous emails. > > > The Java code I used: > > package com.example.AesBenchmark; > > import java.security.Security; > import java.io.File; > import java.io.FileInputStream; > import java.io.FileOutputStream; > import java.security.SecureRandom; > > import javax.crypto.Cipher; > import javax.crypto.KeyGenerator; > import javax.crypto.SecretKey; > import javax.crypto.spec.IvParameterSpec; > import javax.crypto.spec.SecretKeySpec; > > public class AesBenchmark { > static { > try { > Security.setProperty("crypto.policy", "unlimited"); > } catch (Exception e) { > e.printStackTrace(); > } > } > > static final int BUF_LEN = 4096; > > public static void main(String[] args) throws Exception > { > KeyGenerator keyGenerator = KeyGenerator.getInstance("AES"); > keyGenerator.init(256); > > // Generate Key > SecretKey key = keyGenerator.generateKey(); > > // Generating IV. > byte[] IV = new byte[16]; > SecureRandom random = new SecureRandom(); > random.nextBytes(IV); > > //Get Cipher Instance > Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding"); > > //Create SecretKeySpec > SecretKeySpec keySpec = new SecretKeySpec(key.getEncoded(), > "AES"); > > //Create IvParameterSpec > IvParameterSpec ivSpec = new IvParameterSpec(IV); > > //Initialize Cipher for ENCRYPT_MODE > cipher.init(Cipher.ENCRYPT_MODE, keySpec, ivSpec); > > byte[] bufInput = new byte[BUF_LEN]; > FileInputStream fis = new FileInputStream(new > File("/dev/stdin")); > FileOutputStream fos = new FileOutputStream(new > File("/dev/stdout")); > int nBytes; > while ((nBytes = fis.read(bufInput, 0, BUF_LEN)) != -1) > { > fos.write(cipher.update(bufInput, 0, nBytes)); > } > fos.write(cipher.doFinal()); > } > } > > On 19/11/2021 13:28, Jeff Jirsa wrote: > > > > For better or worse, different threat models mean that it’s not strictly > better to do FDE and some use cases definitely want this at the db layer > instead of file system. > > > >> On Nov 19, 2021, at 12:54 PM, Joshua McKenzie<jmcken...@apache.org> > wrote: > >> > >> > >>> > >>> setting performance requirements on this regard is a > >>> nonsense. As long as it's reasonably usable in real world, and > Cassandra > >>> makes the estimated effects on performance available, it will be up to > >>> the operators to decide whether to turn on the feature > >> I think Joey's argument, and correct me if I'm wrong, is that > implementing > >> a complex feature in Cassandra that we then have to manage that's > >> essentially worse in every way compared to a built-in full-disk > encryption > >> option via LUKS+LVM etc is a poor use of our time and energy. > >> > >> i.e. we'd be better off investing our time into documenting how to do > full > >> disk encryption in a variety of scenarios + explaining why that is our > >> recommended approach instead of taking the time and energy to design, > >> implement, debug, and then maintain an inferior solution. > >> > >>> On Fri, Nov 19, 2021 at 7:49 AM Joshua McKenzie<jmcken...@apache.org> > >>> wrote: > >>> > >>> Are you for real here? > >>> > >>> Please keep things cordial. Statements like this don't help move the > >>> conversation along. > >>> > >>> > >>> On Fri, Nov 19, 2021 at 3:57 AM Stefan Miklosovic < > >>> stefan.mikloso...@instaclustr.com> wrote: > >>> > >>>> On Fri, 19 Nov 2021 at 02:51, Joseph Lynch<joe.e.ly...@gmail.com> > wrote: > >>>>> On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja < > >>>> shylaja.koko...@intel.com> > >>>>> wrote: > >>>>> > >>>>>> To address Joey's concern, the OpenJDK JVM and its derivatives > >>>> optimize > >>>>>> Java crypto based on the underlying HW capabilities. For example, if > >>>> the > >>>>>> underlying HW supports AES-NI, JVM intrinsics will use those for > >>>> crypto > >>>>>> operations. Likewise, the new vector AES available on the latest > Intel > >>>>>> platform is utilized by the JVM while running on that platform to > make > >>>>>> crypto operations faster. > >>>>>> > >>>>> Which JDK version were you running? We have had a number of issues > with > >>>> the > >>>>> JVM being 2-10x slower than native crypto on Java 8 (especially MD5, > >>>> SHA1, > >>>>> and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). > Again > >>>> I > >>>>> think we could get the JVM crypto penalty down to ~2x native if we > >>>> linked > >>>>> in e.g. ACCP by default [1, 2] but even the very best Java crypto > I've > >>>> seen > >>>>> (fully utilizing hardware instructions) is still ~2x slower than > native > >>>>> code. The operating system has a number of advantages here in that > they > >>>>> don't pay JVM allocation costs or the JNI barrier (in the case of > ACCP) > >>>> and > >>>>> the kernel also takes advantage of hardware instructions. > >>>>> > >>>>> > >>>>>> From our internal experiments, we see single digit % regression > when > >>>>>> transparent data encryption is enabled. > >>>>>> > >>>>> Which workloads are you testing and how are you measuring the > >>>> regression? I > >>>>> suspect that compaction, repair (validation compaction), streaming, > and > >>>>> quorum reads are probably much slower (probably ~10x slower for the > >>>>> throughput bound operations and ~2x slower on the read path). As > >>>>> compaction/repair/streaming usually take up between 10-20% of > available > >>>> CPU > >>>>> cycles making them 2x slower might show up as <10% overall > utilization > >>>>> increase when you've really regressed 100% or more on key metrics > >>>>> (compaction throughput, streaming throughput, memory allocation rate, > >>>> etc > >>>>> ...). For example, if compaction was able to achieve 2 MiBps of > >>>> throughput > >>>>> before encryption and it was only able to achieve 1MiBps of > throughput > >>>>> afterwards, that would be a huge real world impact to operators as > >>>>> compactions now take twice as long. > >>>>> > >>>>> I think a CEP or details on the ticket that indicate the performance > >>>> tests > >>>>> and workloads that will be run might be wise? Perhaps something like > >>>>> "encryption creates no more than a 1% regression of: compaction > >>>> throughput > >>>>> (MiBps), streaming throughput (MiBps), repair validation throughput > >>>>> (duration of full repair on the entire cluster), read throughput at > 10ms > >>>>> p99 tail at quorum consistency (QPS handled while not exceeding P99 > SLO > >>>> of > >>>>> 10ms), etc ... while a sustained load is applied to a multi-node > >>>> cluster"? > >>>> > >>>> Are you for real here?Nobody will ever guarantee you these %1 numbers > >>>> ... come on. I think we are > >>>> super paranoid about performance when we are not paranoid enough about > >>>> security. This is a two way street. > >>>> People are willing to give up on performance if security is a must. > >>>> You do not need to use it if you do not want to, > >>>> it is not like we are going to turn it on and you have to stick with > >>>> that. Are you just saying that we are going to > >>>> protect people from using some security features because their db > >>>> might be slow? What if they just dont care? > >>>> > >>>>> Even a microbenchmark that just sees how long it takes to encrypt and > >>>>> decrypt a 500MiB dataset using the proposed JVM implementation versus > >>>>> encrypting it with a native implementation might be enough to > >>>> confirm/deny. > >>>>> For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of > >>>>> AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 > >>>> GiBps > >>>>> encryption and 1.0 GiBps decryption; from my past experiences with > Java > >>>>> crypto is it would achieve maybe 200 MiBps of _non-authenticated_ > AES. > >>>>> > >>>>> Cheers, > >>>>> -Joey > >>>>> > >>>>> [1]https://issues.apache.org/jira/browse/CASSANDRA-15294 > >>>>> [2]https://github.com/corretto/amazon-corretto-crypto-provider > >>>>> [3]https://github.com/FiloSottile/age > >>>>> [4]https://github.com/hashbrowncipher/keypipe#encryption > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>> > >>>> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail:dev-h...@cassandra.apache.org > > -- +---------------------------------------------------------------+ | Derek Chen-Becker | | GPG Key available at https://keybase.io/dchenbecker and | | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | +---------------------------------------------------------------+