On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja <shylaja.koko...@intel.com> wrote:
> To address Joey's concern, the OpenJDK JVM and its derivatives optimize > Java crypto based on the underlying HW capabilities. For example, if the > underlying HW supports AES-NI, JVM intrinsics will use those for crypto > operations. Likewise, the new vector AES available on the latest Intel > platform is utilized by the JVM while running on that platform to make > crypto operations faster. > Which JDK version were you running? We have had a number of issues with the JVM being 2-10x slower than native crypto on Java 8 (especially MD5, SHA1, and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again I think we could get the JVM crypto penalty down to ~2x native if we linked in e.g. ACCP by default [1, 2] but even the very best Java crypto I've seen (fully utilizing hardware instructions) is still ~2x slower than native code. The operating system has a number of advantages here in that they don't pay JVM allocation costs or the JNI barrier (in the case of ACCP) and the kernel also takes advantage of hardware instructions. > From our internal experiments, we see single digit % regression when > transparent data encryption is enabled. > Which workloads are you testing and how are you measuring the regression? I suspect that compaction, repair (validation compaction), streaming, and quorum reads are probably much slower (probably ~10x slower for the throughput bound operations and ~2x slower on the read path). As compaction/repair/streaming usually take up between 10-20% of available CPU cycles making them 2x slower might show up as <10% overall utilization increase when you've really regressed 100% or more on key metrics (compaction throughput, streaming throughput, memory allocation rate, etc ...). For example, if compaction was able to achieve 2 MiBps of throughput before encryption and it was only able to achieve 1MiBps of throughput afterwards, that would be a huge real world impact to operators as compactions now take twice as long. I think a CEP or details on the ticket that indicate the performance tests and workloads that will be run might be wise? Perhaps something like "encryption creates no more than a 1% regression of: compaction throughput (MiBps), streaming throughput (MiBps), repair validation throughput (duration of full repair on the entire cluster), read throughput at 10ms p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO of 10ms), etc ... while a sustained load is applied to a multi-node cluster"? Even a microbenchmark that just sees how long it takes to encrypt and decrypt a 500MiB dataset using the proposed JVM implementation versus encrypting it with a native implementation might be enough to confirm/deny. For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 GiBps encryption and 1.0 GiBps decryption; from my past experiences with Java crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES. Cheers, -Joey [1] https://issues.apache.org/jira/browse/CASSANDRA-15294 [2] https://github.com/corretto/amazon-corretto-crypto-provider [3] https://github.com/FiloSottile/age [4] https://github.com/hashbrowncipher/keypipe#encryption