Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v4]
On Mon, 18 Jan 2021 13:39:04 GMT, Claes Redestad wrote: >> - The MD5 intrinsics added by >> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that >> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics >> from which the MD5 intrinsic takes inspiration >> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to >> make it acceptable to use inline and replace the array in MD5 wholesale. >> This improves performance both in the presence and the absence of the >> intrinsic optimization. >> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element >> arrays), but allocating the array lazily gets most of the speed-up in the >> presence of an intrinsic while being neutral in its absence. >> >> Baseline: >> (digesterName) (length)Cnt Score >> Error Units >> MessageDigests.digestMD516 15 >> 2714.307 ± 21.133 ops/ms >> MessageDigests.digestMD5 1024 15 >> 318.087 ±0.637 ops/ms >> MessageDigests.digest SHA-116 15 >> 1387.266 ± 40.932 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 109.273 ±0.149 ops/ms >> MessageDigests.digestSHA-25616 15 >> 995.566 ± 21.186 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.104 ±0.079 ops/ms >> MessageDigests.digestSHA-51216 15 >> 803.030 ± 15.722 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 115.611 ±0.234 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2190.367 ± 97.037 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 302.903 ±1.809 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1262.656 ± 43.751 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 104.889 ±3.554 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 914.541 ± 55.621 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 85.708 ±1.394 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 737.719 ± 53.671 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.307 ±1.950 ops/ms >> >> GC: >> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 >> 312.011 ±0.005B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 >> 584.020 ±0.006B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 >> 544.019 ±0.016B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 >> 1056.037 ±0.003B/op >> >> Target: >> Benchmark (digesterName) (length)Cnt >> Score Error Units >> MessageDigests.digestMD516 15 >> 3134.462 ± 43.685 ops/ms >> MessageDigests.digestMD5 1024 15 >> 323.667 ±0.633 ops/ms >> MessageDigests.digest SHA-116 15 >> 1418.742 ± 38.223 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 110.178 ±0.788 ops/ms >> MessageDigests.digestSHA-25616 15 >> 1037.949 ± 21.214 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.671 ±0.228 ops/ms >> MessageDigests.digestSHA-51216 15 >> 812.028 ± 39.489 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 116.738 ±0.249 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2314.379 ± 229.294 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 307.835 ±5.730 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1326.887 ± 63.263 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 106.611 ±2.292 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 961.589 ± 82.052 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 88.646 ±0.194 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 775.417 ± 56.775 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.904 ±2.014 ops/ms >> >>
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v4]
> - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length)Cnt Score > Error Units > MessageDigests.digestMD516 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digestMD5 1024 15 > 318.087 ±0.637 ops/ms > MessageDigests.digest SHA-116 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ±0.149 ops/ms > MessageDigests.digestSHA-25616 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.104 ±0.079 ops/ms > MessageDigests.digestSHA-51216 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digestSHA-512 1024 15 > 115.611 ±0.234 ops/ms > MessageDigests.getAndDigest MD516 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ±1.809 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 104.889 ±3.554 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ±1.394 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ±1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 312.011 ±0.005B/op > MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 > 584.020 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 > 544.019 ±0.016B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 > 1056.037 ±0.003B/op > > Target: > Benchmark (digesterName) (length)Cnt > Score Error Units > MessageDigests.digestMD516 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digestMD5 1024 15 > 323.667 ±0.633 ops/ms > MessageDigests.digest SHA-116 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ±0.788 ops/ms > MessageDigests.digestSHA-25616 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.671 ±0.228 ops/ms > MessageDigests.digestSHA-51216 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digestSHA-512 1024 15 > 116.738 ±0.249 ops/ms > MessageDigests.getAndDigest MD516 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ±5.730 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 106.611 ±2.292 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ±0.194 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ±2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 232.009 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.r
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v3]
> - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length)Cnt Score > Error Units > MessageDigests.digestMD516 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digestMD5 1024 15 > 318.087 ±0.637 ops/ms > MessageDigests.digest SHA-116 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ±0.149 ops/ms > MessageDigests.digestSHA-25616 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.104 ±0.079 ops/ms > MessageDigests.digestSHA-51216 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digestSHA-512 1024 15 > 115.611 ±0.234 ops/ms > MessageDigests.getAndDigest MD516 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ±1.809 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 104.889 ±3.554 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ±1.394 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ±1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 312.011 ±0.005B/op > MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 > 584.020 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 > 544.019 ±0.016B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 > 1056.037 ±0.003B/op > > Target: > Benchmark (digesterName) (length)Cnt > Score Error Units > MessageDigests.digestMD516 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digestMD5 1024 15 > 323.667 ±0.633 ops/ms > MessageDigests.digest SHA-116 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ±0.788 ops/ms > MessageDigests.digestSHA-25616 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.671 ±0.228 ops/ms > MessageDigests.digestSHA-51216 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digestSHA-512 1024 15 > 116.738 ±0.249 ops/ms > MessageDigests.getAndDigest MD516 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ±5.730 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 106.611 ±2.292 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ±0.194 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ±2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 232.009 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.r
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v2]
On Fri, 15 Jan 2021 23:36:35 GMT, Claes Redestad wrote: >> - The MD5 intrinsics added by >> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that >> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics >> from which the MD5 intrinsic takes inspiration >> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to >> make it acceptable to use inline and replace the array in MD5 wholesale. >> This improves performance both in the presence and the absence of the >> intrinsic optimization. >> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element >> arrays), but allocating the array lazily gets most of the speed-up in the >> presence of an intrinsic while being neutral in its absence. >> >> Baseline: >> (digesterName) (length)Cnt Score >> Error Units >> MessageDigests.digestMD516 15 >> 2714.307 ± 21.133 ops/ms >> MessageDigests.digestMD5 1024 15 >> 318.087 ±0.637 ops/ms >> MessageDigests.digest SHA-116 15 >> 1387.266 ± 40.932 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 109.273 ±0.149 ops/ms >> MessageDigests.digestSHA-25616 15 >> 995.566 ± 21.186 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.104 ±0.079 ops/ms >> MessageDigests.digestSHA-51216 15 >> 803.030 ± 15.722 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 115.611 ±0.234 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2190.367 ± 97.037 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 302.903 ±1.809 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1262.656 ± 43.751 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 104.889 ±3.554 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 914.541 ± 55.621 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 85.708 ±1.394 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 737.719 ± 53.671 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.307 ±1.950 ops/ms >> >> GC: >> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 >> 312.011 ±0.005B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 >> 584.020 ±0.006B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 >> 544.019 ±0.016B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 >> 1056.037 ±0.003B/op >> >> Target: >> Benchmark (digesterName) (length)Cnt >> Score Error Units >> MessageDigests.digestMD516 15 >> 3134.462 ± 43.685 ops/ms >> MessageDigests.digestMD5 1024 15 >> 323.667 ±0.633 ops/ms >> MessageDigests.digest SHA-116 15 >> 1418.742 ± 38.223 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 110.178 ±0.788 ops/ms >> MessageDigests.digestSHA-25616 15 >> 1037.949 ± 21.214 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.671 ±0.228 ops/ms >> MessageDigests.digestSHA-51216 15 >> 812.028 ± 39.489 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 116.738 ±0.249 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2314.379 ± 229.294 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 307.835 ±5.730 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1326.887 ± 63.263 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 106.611 ±2.292 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 961.589 ± 82.052 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 88.646 ±0.194 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 775.417 ± 56.775 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.904 ±2.014 ops/ms >> >>
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v2]
> - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length)Cnt Score > Error Units > MessageDigests.digestMD516 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digestMD5 1024 15 > 318.087 ±0.637 ops/ms > MessageDigests.digest SHA-116 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ±0.149 ops/ms > MessageDigests.digestSHA-25616 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.104 ±0.079 ops/ms > MessageDigests.digestSHA-51216 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digestSHA-512 1024 15 > 115.611 ±0.234 ops/ms > MessageDigests.getAndDigest MD516 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ±1.809 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 104.889 ±3.554 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ±1.394 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ±1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 312.011 ±0.005B/op > MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 > 584.020 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 > 544.019 ±0.016B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 > 1056.037 ±0.003B/op > > Target: > Benchmark (digesterName) (length)Cnt > Score Error Units > MessageDigests.digestMD516 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digestMD5 1024 15 > 323.667 ±0.633 ops/ms > MessageDigests.digest SHA-116 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ±0.788 ops/ms > MessageDigests.digestSHA-25616 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.671 ±0.228 ops/ms > MessageDigests.digestSHA-51216 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digestSHA-512 1024 15 > 116.738 ±0.249 ops/ms > MessageDigests.getAndDigest MD516 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ±5.730 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 106.611 ±2.292 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ±0.194 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ±2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 232.009 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.r
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests [v2]
On Fri, 15 Jan 2021 23:21:00 GMT, Valerie Peng wrote: >> Claes Redestad has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains 20 additional >> commits since the last revision: >> >> - Copyrights >> - Merge branch 'master' into improve_md5 >> - Remove unused Unsafe import >> - Harmonize MD4 impl, remove now-redundant checks from ByteArrayAccess (VHs >> do bounds checks, most of which will be optimized away) >> - Merge branch 'master' into improve_md5 >> - Apply allocation avoiding optimizations to all SHA versions sharing >> structural similarities with MD5 >> - Remove unused reverseBytes imports >> - Copyrights >> - Fix copy-paste error >> - Various fixes (IDE stopped IDEing..) >> - ... and 10 more: >> https://git.openjdk.java.net/jdk/compare/6e03c8d3...cafa3e49 > > test/micro/org/openjdk/bench/java/util/UUIDBench.java line 2: > >> 1: /* >> 2: * Copyright (c) 2020, 2021, Oracle and/or its affiliates. All rights >> reserved. > > nit: other files should also have this 2021 update. It seems most of them are > not updated and still uses 2020. fixed - PR: https://git.openjdk.java.net/jdk/pull/1855
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad wrote: > - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length)Cnt Score > Error Units > MessageDigests.digestMD516 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digestMD5 1024 15 > 318.087 ±0.637 ops/ms > MessageDigests.digest SHA-116 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ±0.149 ops/ms > MessageDigests.digestSHA-25616 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.104 ±0.079 ops/ms > MessageDigests.digestSHA-51216 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digestSHA-512 1024 15 > 115.611 ±0.234 ops/ms > MessageDigests.getAndDigest MD516 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ±1.809 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 104.889 ±3.554 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ±1.394 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ±1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 312.011 ±0.005B/op > MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 > 584.020 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 > 544.019 ±0.016B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 > 1056.037 ±0.003B/op > > Target: > Benchmark (digesterName) (length)Cnt > Score Error Units > MessageDigests.digestMD516 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digestMD5 1024 15 > 323.667 ±0.633 ops/ms > MessageDigests.digest SHA-116 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ±0.788 ops/ms > MessageDigests.digestSHA-25616 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.671 ±0.228 ops/ms > MessageDigests.digestSHA-51216 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digestSHA-512 1024 15 > 116.738 ±0.249 ops/ms > MessageDigests.getAndDigest MD516 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ±5.730 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 106.611 ±2.292 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ±0.194 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ±2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 232.009 ±
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Fri, 15 Jan 2021 22:54:32 GMT, Valerie Peng wrote: >> - The MD5 intrinsics added by >> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that >> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics >> from which the MD5 intrinsic takes inspiration >> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to >> make it acceptable to use inline and replace the array in MD5 wholesale. >> This improves performance both in the presence and the absence of the >> intrinsic optimization. >> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element >> arrays), but allocating the array lazily gets most of the speed-up in the >> presence of an intrinsic while being neutral in its absence. >> >> Baseline: >> (digesterName) (length)Cnt Score >> Error Units >> MessageDigests.digestMD516 15 >> 2714.307 ± 21.133 ops/ms >> MessageDigests.digestMD5 1024 15 >> 318.087 ±0.637 ops/ms >> MessageDigests.digest SHA-116 15 >> 1387.266 ± 40.932 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 109.273 ±0.149 ops/ms >> MessageDigests.digestSHA-25616 15 >> 995.566 ± 21.186 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.104 ±0.079 ops/ms >> MessageDigests.digestSHA-51216 15 >> 803.030 ± 15.722 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 115.611 ±0.234 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2190.367 ± 97.037 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 302.903 ±1.809 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1262.656 ± 43.751 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 104.889 ±3.554 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 914.541 ± 55.621 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 85.708 ±1.394 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 737.719 ± 53.671 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.307 ±1.950 ops/ms >> >> GC: >> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 >> 312.011 ±0.005B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 >> 584.020 ±0.006B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 >> 544.019 ±0.016B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 >> 1056.037 ±0.003B/op >> >> Target: >> Benchmark (digesterName) (length)Cnt >> Score Error Units >> MessageDigests.digestMD516 15 >> 3134.462 ± 43.685 ops/ms >> MessageDigests.digestMD5 1024 15 >> 323.667 ±0.633 ops/ms >> MessageDigests.digest SHA-116 15 >> 1418.742 ± 38.223 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 110.178 ±0.788 ops/ms >> MessageDigests.digestSHA-25616 15 >> 1037.949 ± 21.214 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.671 ±0.228 ops/ms >> MessageDigests.digestSHA-51216 15 >> 812.028 ± 39.489 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 116.738 ±0.249 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2314.379 ± 229.294 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 307.835 ±5.730 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1326.887 ± 63.263 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 106.611 ±2.292 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 961.589 ± 82.052 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 88.646 ±0.194 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 775.417 ± 56.775 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.904 ±2.014 ops/ms >> >> G
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad wrote: > - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length)Cnt Score > Error Units > MessageDigests.digestMD516 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digestMD5 1024 15 > 318.087 ±0.637 ops/ms > MessageDigests.digest SHA-116 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ±0.149 ops/ms > MessageDigests.digestSHA-25616 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.104 ±0.079 ops/ms > MessageDigests.digestSHA-51216 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digestSHA-512 1024 15 > 115.611 ±0.234 ops/ms > MessageDigests.getAndDigest MD516 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ±1.809 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 104.889 ±3.554 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ±1.394 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ±1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 312.011 ±0.005B/op > MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 > 584.020 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 > 544.019 ±0.016B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 > 1056.037 ±0.003B/op > > Target: > Benchmark (digesterName) (length)Cnt > Score Error Units > MessageDigests.digestMD516 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digestMD5 1024 15 > 323.667 ±0.633 ops/ms > MessageDigests.digest SHA-116 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ±0.788 ops/ms > MessageDigests.digestSHA-25616 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.671 ±0.228 ops/ms > MessageDigests.digestSHA-51216 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digestSHA-512 1024 15 > 116.738 ±0.249 ops/ms > MessageDigests.getAndDigest MD516 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ±5.730 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 106.611 ±2.292 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ±0.194 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ±2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 232.009 ±
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Thu, 7 Jan 2021 18:50:05 GMT, Claes Redestad wrote: >> Removing the UUID clone cache and running the microbenchmark along with the >> changes in #1933: >> >> Benchmark (size) Mode >> CntScoreError Units >> UUIDBench.fromType3Bytes2 thrpt >> 122.182 ± 0.090 ops/us >> UUIDBench.fromType3Bytes:·gc.alloc.rate 2 thrpt >> 12 439.020 ± 18.241 MB/sec >> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt >> 12 264.022 ± 0.003B/op >> >> The goal now is if to simplify the digest code and compare alternatives. > > I've run various tests and concluded that the `VarHandle`ized code is > matching or improving upon the `Unsafe`-riddled code in `ByteArrayAccess`. I > then went ahead and consolidated to use similar code pattern in > `ByteArrayAccess` for consistency, which amounts to a good cleanup. > > With MD5 intrinsics disabled, I get this baseline: > > Benchmark (size) Mode Cnt >ScoreError Units > UUIDBench.fromType3Bytes2 thrpt 12 >1.245 ± 0.077 ops/us > UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 > 488.042 ± 0.004B/op > > With the current patch here (not including #1933): > Benchmark (size) Mode Cnt >ScoreError Units > UUIDBench.fromType3Bytes2 thrpt 12 >1.431 ± 0.106 ops/us > UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 > 408.035 ± 0.006B/op > > If I isolate the `ByteArrayAccess` changes I'm getting performance neutral or > slightly better numbers compared to baseline for these tests: > > Benchmark (size) Mode Cnt >ScoreError Units > UUIDBench.fromType3Bytes2 thrpt 12 >1.317 ± 0.092 ops/us > UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 > 488.042 ± 0.004B/op Thanks for the performance enhancement, I will take a look. - PR: https://git.openjdk.java.net/jdk/pull/1855
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad wrote: > - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length)Cnt Score > Error Units > MessageDigests.digestMD516 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digestMD5 1024 15 > 318.087 ±0.637 ops/ms > MessageDigests.digest SHA-116 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ±0.149 ops/ms > MessageDigests.digestSHA-25616 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.104 ±0.079 ops/ms > MessageDigests.digestSHA-51216 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digestSHA-512 1024 15 > 115.611 ±0.234 ops/ms > MessageDigests.getAndDigest MD516 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ±1.809 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 104.889 ±3.554 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ±1.394 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ±1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 312.011 ±0.005B/op > MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 > 584.020 ±0.006B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 > 544.019 ±0.016B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 > 1056.037 ±0.003B/op > > Target: > Benchmark (digesterName) (length)Cnt > Score Error Units > MessageDigests.digestMD516 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digestMD5 1024 15 > 323.667 ±0.633 ops/ms > MessageDigests.digest SHA-116 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ±0.788 ops/ms > MessageDigests.digestSHA-25616 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digestSHA-256 1024 15 > 89.671 ±0.228 ops/ms > MessageDigests.digestSHA-51216 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digestSHA-512 1024 15 > 116.738 ±0.249 ops/ms > MessageDigests.getAndDigest MD516 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ±5.730 ops/ms > MessageDigests.getAndDigestSHA-116 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigestSHA-1 1024 15 > 106.611 ±2.292 ops/ms > MessageDigests.getAndDigest SHA-25616 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ±0.194 ops/ms > MessageDigests.getAndDigest SHA-51216 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ±2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 > 232.009 ±
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Tue, 5 Jan 2021 23:08:43 GMT, DellCliff wrote: >> Since `java.util.UUID` and `sun.security.provider.MD5` are both in >> `java.base`, would it make sense to create new instances by calling `new >> MD5()` instead of `java.security.MessageDigest.getInstance("MD5")` and >> bypassing the whole MessageDigest logic? > > Are you sure you're not ending up paying more using a VarHandle and having to > cast and using a var args call `(long) LONG_ARRAY_HANDLE.get(buf, ofs);` > instead of creating a ByteBuffer once via > `ByteBuffer.wrap(buffer).order(ByteOrder.nativeOrder()).asLongBuffer()`? Hitting up `new MD5()` directly could be a great idea. I expect this would be just as fast as the cache+clone (if not faster), but I'm a bit worried we'd be short-circuiting the ability to install an alternative MD5 provider (which may or may not be a thing we must support..), but it's worth exploring. Comparing performance of this against a `ByteBuffer` impl is on my TODO. The `VarHandle` gets heavily inlined and optimized here, though, with performance in my tests similar to the `Unsafe` use in `ByteArrayAccess`. - PR: https://git.openjdk.java.net/jdk/pull/1855
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Thu, 7 Jan 2021 14:45:03 GMT, Claes Redestad wrote: >> I've identified a number of optimizations to the plumbing behind >> `MessageDigest.getDigest(..)` over in #1933 that removes 80-90% of the >> throughput overhead and all the allocation overhead compared to the >> `clone()` approach prototyped here. The remaining 20ns/op overhead might not >> be enough of a concern to do a point fix in `UUID::nameUUIDFromBytes`. > > Removing the UUID clone cache and running the microbenchmark along with the > changes in #1933: > > Benchmark (size) Mode Cnt >ScoreError Units > UUIDBench.fromType3Bytes2 thrpt 12 >2.182 ± 0.090 ops/us > UUIDBench.fromType3Bytes:·gc.alloc.rate 2 thrpt 12 > 439.020 ± 18.241 MB/sec > UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 > 264.022 ± 0.003B/op > > The goal now is if to simplify the digest code and compare alternatives. I've run various tests and concluded that the `VarHandle`ized code is matching or improving upon the `Unsafe`-riddled code in `ByteArrayAccess`. I then went ahead and consolidated to use similar code pattern in `ByteArrayAccess` for consistency, which amounts to a good cleanup. With MD5 intrinsics disabled, I get this baseline: Benchmark (size) Mode Cnt ScoreError Units UUIDBench.fromType3Bytes2 thrpt 12 1.245 ± 0.077 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 488.042 ± 0.004B/op With the current patch here (not including #1933): Benchmark (size) Mode Cnt ScoreError Units UUIDBench.fromType3Bytes2 thrpt 12 1.431 ± 0.106 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 408.035 ± 0.006B/op If I isolate the `ByteArrayAccess` changes I'm getting performance neutral or slightly better numbers compared to baseline for these tests: Benchmark (size) Mode Cnt ScoreError Units UUIDBench.fromType3Bytes2 thrpt 12 1.317 ± 0.092 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 488.042 ± 0.004B/op - PR: https://git.openjdk.java.net/jdk/pull/1855
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Wed, 6 Jan 2021 00:41:29 GMT, Claes Redestad wrote: >> Are you sure you're not ending up paying more using a VarHandle and having >> to cast and using a var args call `(long) LONG_ARRAY_HANDLE.get(buf, ofs);` >> instead of creating a ByteBuffer once via >> `ByteBuffer.wrap(buffer).order(ByteOrder.nativeOrder()).asLongBuffer()`? > > Hitting up `new MD5()` directly could be a great idea. I expect this would be > just as fast as the cache+clone (if not faster), but I'm a bit worried we'd > be short-circuiting the ability to install an alternative MD5 provider (which > may or may not be a thing we must support..), but it's worth exploring. > > Comparing performance of this against a `ByteBuffer` impl is on my TODO. The > `VarHandle` gets heavily inlined and optimized here, though, with performance > in my tests similar to the `Unsafe` use in `ByteArrayAccess`. I've identified a number of optimizations to the plumbing behind `MessageDigest.getDigest(..)` over in #1933 that removes 80-90% of the throughput overhead and all the allocation overhead compared to the `clone()` approach prototyped here. The remaining 20ns/op overhead might not be enough of a concern to do a point fix in `UUID::nameUUIDFromBytes`. - PR: https://git.openjdk.java.net/jdk/pull/1855
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Wed, 6 Jan 2021 01:27:52 GMT, Claes Redestad wrote: >> Hitting up `new MD5()` directly could be a great idea. I expect this would >> be just as fast as the cache+clone (if not faster), but I'm a bit worried >> we'd be short-circuiting the ability to install an alternative MD5 provider >> (which may or may not be a thing we must support..), but it's worth >> exploring. >> >> Comparing performance of this against a `ByteBuffer` impl is on my TODO. The >> `VarHandle` gets heavily inlined and optimized here, though, with >> performance in my tests similar to the `Unsafe` use in `ByteArrayAccess`. > > I've identified a number of optimizations to the plumbing behind > `MessageDigest.getDigest(..)` over in #1933 that removes 80-90% of the > throughput overhead and all the allocation overhead compared to the `clone()` > approach prototyped here. The remaining 20ns/op overhead might not be enough > of a concern to do a point fix in `UUID::nameUUIDFromBytes`. Removing the UUID clone cache and running the microbenchmark along with the changes in #1933: Benchmark (size) Mode Cnt ScoreError Units UUIDBench.fromType3Bytes2 thrpt 12 2.182 ± 0.090 ops/us UUIDBench.fromType3Bytes:·gc.alloc.rate 2 thrpt 12 439.020 ± 18.241 MB/sec UUIDBench.fromType3Bytes:·gc.alloc.rate.norm2 thrpt 12 264.022 ± 0.003B/op The goal now is if to simplify the digest code and compare alternatives. - PR: https://git.openjdk.java.net/jdk/pull/1855
Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests
On Tue, 5 Jan 2021 21:51:51 GMT, DellCliff wrote: >> - The MD5 intrinsics added by >> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that >> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics >> from which the MD5 intrinsic takes inspiration >> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to >> make it acceptable to use inline and replace the array in MD5 wholesale. >> This improves performance both in the presence and the absence of the >> intrinsic optimization. >> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element >> arrays), but allocating the array lazily gets most of the speed-up in the >> presence of an intrinsic while being neutral in its absence. >> >> Baseline: >> (digesterName) (length)Cnt Score >> Error Units >> MessageDigests.digestMD516 15 >> 2714.307 ± 21.133 ops/ms >> MessageDigests.digestMD5 1024 15 >> 318.087 ±0.637 ops/ms >> MessageDigests.digest SHA-116 15 >> 1387.266 ± 40.932 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 109.273 ±0.149 ops/ms >> MessageDigests.digestSHA-25616 15 >> 995.566 ± 21.186 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.104 ±0.079 ops/ms >> MessageDigests.digestSHA-51216 15 >> 803.030 ± 15.722 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 115.611 ±0.234 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2190.367 ± 97.037 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 302.903 ±1.809 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1262.656 ± 43.751 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 104.889 ±3.554 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 914.541 ± 55.621 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 85.708 ±1.394 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 737.719 ± 53.671 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.307 ±1.950 ops/ms >> >> GC: >> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 >> 312.011 ±0.005B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 >> 584.020 ±0.006B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 >> 544.019 ±0.016B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 >> 1056.037 ±0.003B/op >> >> Target: >> Benchmark (digesterName) (length)Cnt >> Score Error Units >> MessageDigests.digestMD516 15 >> 3134.462 ± 43.685 ops/ms >> MessageDigests.digestMD5 1024 15 >> 323.667 ±0.633 ops/ms >> MessageDigests.digest SHA-116 15 >> 1418.742 ± 38.223 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 110.178 ±0.788 ops/ms >> MessageDigests.digestSHA-25616 15 >> 1037.949 ± 21.214 ops/ms >> MessageDigests.digestSHA-256 1024 15 >> 89.671 ±0.228 ops/ms >> MessageDigests.digestSHA-51216 15 >> 812.028 ± 39.489 ops/ms >> MessageDigests.digestSHA-512 1024 15 >> 116.738 ±0.249 ops/ms >> MessageDigests.getAndDigest MD516 15 >> 2314.379 ± 229.294 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 307.835 ±5.730 ops/ms >> MessageDigests.getAndDigestSHA-116 15 >> 1326.887 ± 63.263 ops/ms >> MessageDigests.getAndDigestSHA-1 1024 15 >> 106.611 ±2.292 ops/ms >> MessageDigests.getAndDigest SHA-25616 15 >> 961.589 ± 82.052 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 88.646 ±0.194 ops/ms >> MessageDigests.getAndDigest SHA-51216 15 >> 775.417 ± 56.775 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.904 ±2.014 ops/ms >> >> GC >
RFR: 8259498: Reduce overhead of MD5 and SHA digests
- The MD5 intrinsics added by [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that the `int[] x` isn't actually needed. This also applies to the SHA intrinsics from which the MD5 intrinsic takes inspiration - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to make it acceptable to use inline and replace the array in MD5 wholesale. This improves performance both in the presence and the absence of the intrinsic optimization. - Doing the exact same thing in the SHA impls would be unwieldy (64+ element arrays), but allocating the array lazily gets most of the speed-up in the presence of an intrinsic while being neutral in its absence. Baseline: (digesterName) (length)Cnt Score Error Units MessageDigests.digestMD516 15 2714.307 ± 21.133 ops/ms MessageDigests.digestMD5 1024 15 318.087 ±0.637 ops/ms MessageDigests.digest SHA-116 15 1387.266 ± 40.932 ops/ms MessageDigests.digest SHA-1 1024 15 109.273 ±0.149 ops/ms MessageDigests.digestSHA-25616 15 995.566 ± 21.186 ops/ms MessageDigests.digestSHA-256 1024 15 89.104 ±0.079 ops/ms MessageDigests.digestSHA-51216 15 803.030 ± 15.722 ops/ms MessageDigests.digestSHA-512 1024 15 115.611 ±0.234 ops/ms MessageDigests.getAndDigest MD516 15 2190.367 ± 97.037 ops/ms MessageDigests.getAndDigest MD5 1024 15 302.903 ±1.809 ops/ms MessageDigests.getAndDigestSHA-116 15 1262.656 ± 43.751 ops/ms MessageDigests.getAndDigestSHA-1 1024 15 104.889 ±3.554 ops/ms MessageDigests.getAndDigest SHA-25616 15 914.541 ± 55.621 ops/ms MessageDigests.getAndDigest SHA-256 1024 15 85.708 ±1.394 ops/ms MessageDigests.getAndDigest SHA-51216 15 737.719 ± 53.671 ops/ms MessageDigests.getAndDigest SHA-512 1024 15 112.307 ±1.950 ops/ms GC: MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 312.011 ±0.005B/op MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 584.020 ±0.006B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 544.019 ±0.016B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-51216 15 1056.037 ±0.003B/op Target: Benchmark (digesterName) (length)Cnt Score Error Units MessageDigests.digestMD516 15 3134.462 ± 43.685 ops/ms MessageDigests.digestMD5 1024 15 323.667 ±0.633 ops/ms MessageDigests.digest SHA-116 15 1418.742 ± 38.223 ops/ms MessageDigests.digest SHA-1 1024 15 110.178 ±0.788 ops/ms MessageDigests.digestSHA-25616 15 1037.949 ± 21.214 ops/ms MessageDigests.digestSHA-256 1024 15 89.671 ±0.228 ops/ms MessageDigests.digestSHA-51216 15 812.028 ± 39.489 ops/ms MessageDigests.digestSHA-512 1024 15 116.738 ±0.249 ops/ms MessageDigests.getAndDigest MD516 15 2314.379 ± 229.294 ops/ms MessageDigests.getAndDigest MD5 1024 15 307.835 ±5.730 ops/ms MessageDigests.getAndDigestSHA-116 15 1326.887 ± 63.263 ops/ms MessageDigests.getAndDigestSHA-1 1024 15 106.611 ±2.292 ops/ms MessageDigests.getAndDigest SHA-25616 15 961.589 ± 82.052 ops/ms MessageDigests.getAndDigest SHA-256 1024 15 88.646 ±0.194 ops/ms MessageDigests.getAndDigest SHA-51216 15 775.417 ± 56.775 ops/ms MessageDigests.getAndDigest SHA-512 1024 15 112.904 ±2.014 ops/ms GC MessageDigests.getAndDigest:·gc.alloc.rate.norm MD516 15 232.009 ±0.006B/op MessageDigests.getAndDigest:·gc.alloc.rate.normSHA-116 15 584.021 ±0.001B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-25616 15 272.012 ±0.015B/op MessageDigests.getAndDigest