Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. I think you'll want to ask port maintainers for aarch64/arm/ppc/riscv/s390 to review and test those changes. There may be some opportunities for minor improvements, but those could be done later. For example, we are computing `zlen` for the squareToLen stub even though the value is unused. And both x86 and aarch64 seem to have unneeded save/restore code, even though I think all these registers are killed when called by a C2 runtime stub. - PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2062138149
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6662: > 6660: push(tmp5); > 6661: > 6662: push(xlen); There may be an opportunity here (separate RFE?) to get rid of the save/restore for these. I don't think it's necessary if this is called as part of a C2 stub. - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569452818
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]
On Wed, 17 Apr 2024 19:45:02 GMT, Dean Long wrote: >> Yudi Zheng has updated the pull request incrementally with one additional >> commit since the last revision: >> >> address comment. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702: > >> 4700: const Register tmp5 = r15; >> 4701: const Register tmp6 = r16; >> 4702: const Register tmp7 = r17; > > No need for r17 or sorting tmps. Make tmp0 r3, or r6, r7, etc. Also, I don't see why the code below saves and restores r4/r5. Maybe @theRealAph knows? Aren't all these registers killed across a runtime call? - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569427241
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693: > 4691: const Register xlen = r1; > 4692: const Register z = r2; > 4693: const Register zlen = r3; LibraryCallKit::inline_squareToLen() is still computing zlen and passing it as the 4th arg, even though the value is unused. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702: > 4700: const Register tmp5 = r15; > 4701: const Register tmp6 = r16; > 4702: const Register tmp7 = r17; No need for r17 or sorting tmps. Make tmp0 r3, or r6, r7, etc. - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569419199 PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569420732
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: > 4668: const Register tmp5 = r15; > 4669: const Register tmp6 = r16; > 4670: const Register tmp7 = r17; Why not minimize changes and continue to use r5 for tmp0? I see no need for r17 or to reassign all the other tmp registers. - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569401544
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]
On Tue, 26 Mar 2024 15:59:33 GMT, Damon Fenacci wrote: >> Yudi Zheng has updated the pull request incrementally with one additional >> commit since the last revision: >> >> address comment. > > `multiply_to_len` seems to be used by `generate_squareToLen` as well for > aarch64 and riscv but `zlen` is still passed in a register. > > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710 > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881 > > I think it might work anyway but it might be better to adapt them if only for > completeness. > @dafedafe @dean-long please take a look and let me know if there are further > issues, thanks! Thanks @mur47x111! I noticed that you found even a few more `zlen` usages Did you test the change against all affected platforms? - PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2061301421
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]
On Tue, 26 Mar 2024 15:59:33 GMT, Damon Fenacci wrote: >> Yudi Zheng has updated the pull request incrementally with one additional >> commit since the last revision: >> >> address comment. > > `multiply_to_len` seems to be used by `generate_squareToLen` as well for > aarch64 and riscv but `zlen` is still passed in a register. > > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710 > https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881 > > I think it might work anyway but it might be better to adapt them if only for > completeness. @dafedafe @dean-long please take a look and let me know if there are further issues, thanks! - PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2061162283
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]
> Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. - Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/c5d521dc..72ba58ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18226=04 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v4]
> Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. - Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/870a6127..c5d521dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18226=03 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=02-03 Stats: 53 lines in 10 files changed: 3 ins; 6 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]
On Tue, 19 Mar 2024 21:09:31 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. `multiply_to_len` seems to be used by `generate_squareToLen` as well for aarch64 and riscv but `zlen` is still passed in a register. https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710 https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881 I think it might work anyway but it might be better to adapt them if only for completeness. - PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-1960906919
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]
> Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. - Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/bfc323b7..870a6127 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18226=02 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]
On Tue, 19 Mar 2024 19:06:36 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670: > 4668: const Register tmp6 = r15; > 4669: const Register tmp7 = r16; > 4670: const Register tmp8 = r17; It looks like tmp8 is never used. The call to multiply_to_len() below needs to be adjusted. - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1530986811
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]
On Tue, 19 Mar 2024 19:06:36 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comment. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3559: > 3557: Register tmp5, Register tmp6, > Register product_hi, Register tmp8) { > 3558: > 3559: assert_different_registers(x, xlen, y, ylen, z, tmp1, tmp2, tmp3, > tmp4, tmp5, tmp6, tmp8); Suggestion: assert_different_registers(x, xlen, y, ylen, z, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp8, product_hi); - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1530980566
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]
> Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. - Changes: - all: https://git.openjdk.org/jdk/pull/18226/files - new: https://git.openjdk.org/jdk/pull/18226/files/37232a5f..bfc323b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18226=01 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=00-01 Stats: 70 lines in 11 files changed: 8 ins; 28 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]
On Mon, 18 Mar 2024 16:55:28 GMT, Damon Fenacci wrote: > Quite a simplification! Have you checked if there are any performance > differences? Ran https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.micro.benchmarks/src/micro/benchmarks/BigIntegerBenchmark.java The results are $ before change Benchmark Mode CntScore Error Units BigIntegerBenchmark.bigIntMontgomeryMul thrpt5 122.488 ± 0.130 ops/s BigIntegerBenchmark.bigIntMontgomerySqr thrpt5 76.023 ± 0.106 ops/s BigIntegerBenchmark.bigIntMulthrpt5 330.130 ± 0.349 ops/s BigIntegerBenchmark.bigIntMulAdd thrpt5 455.590 ± 0.663 ops/s $ after change Benchmark Mode CntScore Error Units BigIntegerBenchmark.bigIntMontgomeryMul thrpt5 124.407 ± 0.045 ops/s BigIntegerBenchmark.bigIntMontgomerySqr thrpt5 76.036 ± 0.232 ops/s BigIntegerBenchmark.bigIntMulthrpt5 329.836 ± 0.953 ops/s BigIntegerBenchmark.bigIntMulAdd thrpt5 456.485 ± 0.766 ops/s - PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2007922439
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng wrote: > Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. src/hotspot/share/opto/library_call.cpp line 5934: > 5932: // 'y_start' points to y array + scaled ylen > 5933: > 5934: Node* zlen = _gvn.transform(new AddINode(xlen, ylen)); Would could generate one less instruction in the code cache if we did this `add` in the native runtime function. - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1529102070
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng wrote: > Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. src/java.base/share/classes/java/math/BigInteger.java line 1836: > 1834: > 1835: if (z == null || z.length < (xlen+ ylen)) > 1836: z = new int[xlen+ylen]; Spaces before and after "+" please. Tnx - PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1528984661
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng wrote: > Moving array construction within BigInteger.implMultiplyToLen intrinsic > candidate to its caller simplifies the intrinsic implementation in JIT > compiler. Quite a simplification! Have you checked if there are any performance differences? - PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-1943670073
RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic
Moving array construction within BigInteger.implMultiplyToLen intrinsic candidate to its caller simplifies the intrinsic implementation in JIT compiler. - Commit messages: - Simplify BigInteger.implMultiplyToLen intrinsic Changes: https://git.openjdk.org/jdk/pull/18226/files Webrev: https://webrevs.openjdk.org/?repo=jdk=18226=00 Issue: https://bugs.openjdk.org/browse/JDK-8327964 Stats: 53 lines in 2 files changed: 4 ins; 49 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18226.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226 PR: https://git.openjdk.org/jdk/pull/18226