- The MD5 intrinsics added by [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that the `int[] x` isn't actually needed. This also applies to the SHA intrinsics from which the MD5 intrinsic takes inspiration - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to make it acceptable to use inline and replace the array in MD5 wholesale. This improves performance both in the presence and the absence of the intrinsic optimization. - Doing the exact same thing in the SHA impls would be unwieldy (64+ element arrays), but allocating the array lazily gets most of the speed-up in the presence of an intrinsic while being neutral in its absence.
Baseline: (digesterName) (length) Cnt Score Error Units MessageDigests.digest MD5 16 15 2714.307 ± 21.133 ops/ms MessageDigests.digest MD5 1024 15 318.087 ± 0.637 ops/ms MessageDigests.digest SHA-1 16 15 1387.266 ± 40.932 ops/ms MessageDigests.digest SHA-1 1024 15 109.273 ± 0.149 ops/ms MessageDigests.digest SHA-256 16 15 995.566 ± 21.186 ops/ms MessageDigests.digest SHA-256 1024 15 89.104 ± 0.079 ops/ms MessageDigests.digest SHA-512 16 15 803.030 ± 15.722 ops/ms MessageDigests.digest SHA-512 1024 15 115.611 ± 0.234 ops/ms MessageDigests.getAndDigest MD5 16 15 2190.367 ± 97.037 ops/ms MessageDigests.getAndDigest MD5 1024 15 302.903 ± 1.809 ops/ms MessageDigests.getAndDigest SHA-1 16 15 1262.656 ± 43.751 ops/ms MessageDigests.getAndDigest SHA-1 1024 15 104.889 ± 3.554 ops/ms MessageDigests.getAndDigest SHA-256 16 15 914.541 ± 55.621 ops/ms MessageDigests.getAndDigest SHA-256 1024 15 85.708 ± 1.394 ops/ms MessageDigests.getAndDigest SHA-512 16 15 737.719 ± 53.671 ops/ms MessageDigests.getAndDigest SHA-512 1024 15 112.307 ± 1.950 ops/ms GC: MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 312.011 ± 0.005 B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 584.020 ± 0.006 B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 544.019 ± 0.016 B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 1056.037 ± 0.003 B/op Target: Benchmark (digesterName) (length) Cnt Score Error Units MessageDigests.digest MD5 16 15 3134.462 ± 43.685 ops/ms MessageDigests.digest MD5 1024 15 323.667 ± 0.633 ops/ms MessageDigests.digest SHA-1 16 15 1418.742 ± 38.223 ops/ms MessageDigests.digest SHA-1 1024 15 110.178 ± 0.788 ops/ms MessageDigests.digest SHA-256 16 15 1037.949 ± 21.214 ops/ms MessageDigests.digest SHA-256 1024 15 89.671 ± 0.228 ops/ms MessageDigests.digest SHA-512 16 15 812.028 ± 39.489 ops/ms MessageDigests.digest SHA-512 1024 15 116.738 ± 0.249 ops/ms MessageDigests.getAndDigest MD5 16 15 2314.379 ± 229.294 ops/ms MessageDigests.getAndDigest MD5 1024 15 307.835 ± 5.730 ops/ms MessageDigests.getAndDigest SHA-1 16 15 1326.887 ± 63.263 ops/ms MessageDigests.getAndDigest SHA-1 1024 15 106.611 ± 2.292 ops/ms MessageDigests.getAndDigest SHA-256 16 15 961.589 ± 82.052 ops/ms MessageDigests.getAndDigest SHA-256 1024 15 88.646 ± 0.194 ops/ms MessageDigests.getAndDigest SHA-512 16 15 775.417 ± 56.775 ops/ms MessageDigests.getAndDigest SHA-512 1024 15 112.904 ± 2.014 ops/ms GC MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 232.009 ± 0.006 B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 584.021 ± 0.001 B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 272.012 ± 0.015 B/op MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 400.017 ± 0.019 B/op For the `digest` micro digesting small inputs is faster with all algorithms, ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not allocating and reading into a temporary buffer once outside of the intrinsic. SHA-1 does not see a statistically gain because the intrinsic is disabled by default on my HW. For the `getAndDigest` micro - which tests `MessageDigest.getInstance(..).digest(..)` there are similar gains with this patch. The interesting aspect here is verifying the reduction in allocations per operation when there's an active intrinsic (again, not for SHA-1). JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which means allocation pressure for SHA-512 is down two thirds from 1200B/op to 400B/op in this contrived test. I've verified there are no regressions in the absence of the intrinsic - which the SHA-1 numbers here help show. ------------- Commit messages: - Remove unused Unsafe import - Harmonize MD4 impl, remove now-redundant checks from ByteArrayAccess (VHs do bounds checks, most of which will be optimized away) - Merge branch 'master' into improve_md5 - Apply allocation avoiding optimizations to all SHA versions sharing structural similarities with MD5 - Remove unused reverseBytes imports - Copyrights - Fix copy-paste error - Various fixes (IDE stopped IDEing..) - Add imports - mismatched parens - ... and 8 more: https://git.openjdk.java.net/jdk/compare/090bd3af...e1c943c5 Changes: https://git.openjdk.java.net/jdk/pull/1855/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1855&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259498 Stats: 649 lines in 8 files changed: 83 ins; 344 del; 222 mod Patch: https://git.openjdk.java.net/jdk/pull/1855.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1855/head:pull/1855 PR: https://git.openjdk.java.net/jdk/pull/1855