Integrated: 8333867: SHA3 performance can be improved
On Mon, 10 Jun 2024 15:01:55 GMT, Ferenc Rakoczi wrote: > This PR removes some unnecessary conversions between byte arrays and long > arrays during SHA3 digest computations. This pull request has now been integrated. Changeset: 75bea280 Author:Ferenc Rakoczi Committer: Weijun Wang URL: https://git.openjdk.org/jdk/commit/75bea280b9adb6dac9fefafbb3f4b212f100fbb5 Stats: 86 lines in 3 files changed: 24 ins; 35 del; 27 mod 8333867: SHA3 performance can be improved Reviewed-by: kvn, valeriep - PR: https://git.openjdk.org/jdk/pull/19632
Re: RFR: 8333867: SHA3 performance can be improved [v4]
> This PR removes some unnecessary conversions between byte arrays and long > arrays during SHA3 digest computations. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Copyright year update. - Changes: - all: https://git.openjdk.org/jdk/pull/19632/files - new: https://git.openjdk.org/jdk/pull/19632/files/4fcdd09f..2bcd3000 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19632&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19632&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19632/head:pull/19632 PR: https://git.openjdk.org/jdk/pull/19632
Re: RFR: 8333867: SHA3 performance can be improved [v3]
> This PR removes some unnecessary conversions between byte arrays and long > arrays during SHA3 digest computations. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Accept more review suggestions - Changes: - all: https://git.openjdk.org/jdk/pull/19632/files - new: https://git.openjdk.org/jdk/pull/19632/files/0a9f9624..4fcdd09f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19632&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19632&range=01-02 Stats: 21 lines in 1 file changed: 2 ins; 8 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/19632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19632/head:pull/19632 PR: https://git.openjdk.org/jdk/pull/19632
Re: RFR: 8333867: SHA3 performance can be improved [v2]
On Thu, 13 Jun 2024 20:25:22 GMT, Valerie Peng wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fix clone(), accept review suggestions. > > src/java.base/share/classes/sun/security/provider/SHA3.java line 73: > >> 71: // The following array is allocated to size WIDTH bytes, but we only >> 72: // ever use the first blockSize bytes it (for bytes <-> long >> conversions) >> 73: private byte[] byteState = new byte[WIDTH]; > > Since we are storing the state in longs now, this "byte <-> long" conversion > can be made through a local variable, right? Is there a reason for having > this `byteState` field with size WIDTH bytes? This is interesting: if I use WIDTH (or in blockSize) long arrays in the local level, the performance drops a few per cents. Even more when I only declare the local array in the block in which it is used. However, since we really only need 8 bytes, if I allocate that at the beginning of the function, I don't see that performance drop. So I rewrote the output loop in the function and got rid of the class level declaration. > src/java.base/share/classes/sun/security/provider/SHA3.java line 121: > >> 119: } >> 120: implCompress(buffer, 0); >> 121: int availableBytes = buffer.length; > > change to `blockSize` as in `implCompress0(...)`? Yes, thanks, I wanted to, just forgot. - PR Review Comment: https://git.openjdk.org/jdk/pull/19632#discussion_r1639572500 PR Review Comment: https://git.openjdk.org/jdk/pull/19632#discussion_r1639572413
Re: RFR: 8333867: SHA3 performance can be improved [v2]
On Fri, 14 Jun 2024 05:56:05 GMT, Andrey Turbanov wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fix clone(), accept review suggestions. > > src/java.base/share/classes/sun/security/provider/SHA3.java line 152: > >> 150: */ >> 151: void implReset() { >> 152: Arrays.fill(state,0L); > > Suggestion: > > Arrays.fill(state, 0L); Thanks! Fixed. - PR Review Comment: https://git.openjdk.org/jdk/pull/19632#discussion_r1639572679
Re: RFR: 8333867: SHA3 performance can be improved [v2]
On Wed, 12 Jun 2024 14:08:43 GMT, Ferenc Rakoczi wrote: >> This PR removes some unnecessary conversions between byte arrays and long >> arrays during SHA3 digest computations. > > Ferenc Rakoczi has updated the pull request incrementally with one additional > commit since the last revision: > > Fix clone(), accept review suggestions. tier1,2,3 mach5 tests all passed. - PR Comment: https://git.openjdk.org/jdk/pull/19632#issuecomment-2163807516
Re: RFR: 8333867: SHA3 performance can be improved [v2]
On Tue, 11 Jun 2024 18:21:49 GMT, Daniel Jeliński wrote: >> okay > > Could you try using MethodHandles instead of b2lLittle? Similar to what's > used in ChaCha20: > https://github.com/openjdk/jdk/blob/b77bd5fd6a6f7ddbed90300fba790da4fb683275/src/java.base/share/classes/com/sun/crypto/provider/ChaCha20Cipher.java#L132-L134 > > with MethodHandles you could drop longBuf, removing unnecessary allocation on > the only platform where the intrinsic is implemented (MacOSX-aarch64) b2lLittle() and l2bLittle() are using those MethodHandle stuff, but yes, it adds some more performance if I use them directly in this class, so I changed it. - PR Review Comment: https://git.openjdk.org/jdk/pull/19632#discussion_r1637007344
Re: RFR: 8333867: SHA3 performance can be improved [v2]
> This PR removes some unnecessary conversions between byte arrays and long > arrays during SHA3 digest computations. Ferenc Rakoczi has updated the pull request incrementally with one additional commit since the last revision: Fix clone(), accept review suggestions. - Changes: - all: https://git.openjdk.org/jdk/pull/19632/files - new: https://git.openjdk.org/jdk/pull/19632/files/873164ad..0a9f9624 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19632&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19632&range=00-01 Stats: 34 lines in 1 file changed: 23 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19632/head:pull/19632 PR: https://git.openjdk.org/jdk/pull/19632
Re: RFR: 8333867: SHA3 performance can be improved
On Tue, 11 Jun 2024 18:18:41 GMT, Daniel Jeliński wrote: >> This PR removes some unnecessary conversions between byte arrays and long >> arrays during SHA3 digest computations. > > src/java.base/share/classes/sun/security/provider/SHA3.java line 70: > >> 68: private long[] state = new long[DM*DM]; >> 69: >> 70: // The following two arrays are allocated to size WIDTH bytes, but >> we only > > Only one of the following arrays is of WIDTH length... outdated comment? Well, DM*DM*8 is WIDTH (DM=5 and WIDTH=200), so both arrays have 200 bytes, But I will update the comment to make it clearer. - PR Review Comment: https://git.openjdk.org/jdk/pull/19632#discussion_r1635419497
Re: RFR: 8252204: AArch64: Implement SHA3 accelerator/intrinsic [v11]
On Wed, 2 Aug 2023 11:02:07 GMT, Yudi Zheng wrote: >> Fei Yang has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add if (isJDK16OrHigher()) check for SHA3 in CheckGraalIntrinsics.java > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3473: > >> 3471: __ bcax(v24, __ T16B, v24, v8, v31); >> 3472: >> 3473: __ ld1r(v31, __ T2D, __ post(rscratch1, 8)); > > is it intentional to load 16 bytes and post-increment by 8? Actually, with the ld1r instruction the post increment should be the same as the size of the memory accessed. So T2D requires 8 as it reads 8 bytes(and duplicates it into both halves of the SIMD register). - PR Review Comment: https://git.openjdk.org/jdk/pull/207#discussion_r1281840355