Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

I think you'll want to ask port maintainers for aarch64/arm/ppc/riscv/s390 to 
review and test those changes.
There may be some opportunities for minor improvements, but those could be done 
later.  For example, we are computing `zlen` for the squareToLen stub even 
though the value is unused.  And both x86 and aarch64 seem to have unneeded 
save/restore code, even though I think all these registers are killed when 
called by a C2 runtime stub.

-

PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2062138149


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 6662:

> 6660:   push(tmp5);
> 6661: 
> 6662:   push(xlen);

There may be an opportunity here (separate RFE?) to get rid of the save/restore 
for these.  I don't think it's necessary if this is called as part of a C2 stub.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569452818


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 19:45:02 GMT, Dean Long  wrote:

>> Yudi Zheng has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   address comment.
>
> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702:
> 
>> 4700: const Register tmp5  = r15;
>> 4701: const Register tmp6  = r16;
>> 4702: const Register tmp7  = r17;
> 
> No need for r17 or sorting tmps.  Make tmp0 r3, or r6, r7, etc.

Also, I don't see why the code below saves and restores r4/r5.  Maybe 
@theRealAph knows?  Aren't all these registers killed across a runtime call?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569427241


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4693:

> 4691: const Register xlen  = r1;
> 4692: const Register z = r2;
> 4693: const Register zlen  = r3;

LibraryCallKit::inline_squareToLen() is still computing zlen and passing it as 
the 4th arg, even though the value is unused.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4702:

> 4700: const Register tmp5  = r15;
> 4701: const Register tmp6  = r16;
> 4702: const Register tmp7  = r17;

No need for r17 or sorting tmps.  Make tmp0 r3, or r6, r7, etc.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569419199
PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569420732


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 12:35:54 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670:

> 4668: const Register tmp5  = r15;
> 4669: const Register tmp6  = r16;
> 4670: const Register tmp7  = r17;

Why not minimize changes and continue to use r5 for tmp0?  I see no need for 
r17 or to reassign all the other tmp registers.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1569401544


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]

2024-04-17 Thread Damon Fenacci
On Tue, 26 Mar 2024 15:59:33 GMT, Damon Fenacci  wrote:

>> Yudi Zheng has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   address comment.
>
> `multiply_to_len` seems to be used by `generate_squareToLen` as well for 
> aarch64 and riscv but `zlen` is still passed in a register.
> 
> https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710
> https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881
> 
> I think it might work anyway but it might be better to adapt them if only for 
> completeness.

> @dafedafe @dean-long please take a look and let me know if there are further 
> issues, thanks!

Thanks @mur47x111! I noticed that you found even a few more `zlen` usages  

Did you test the change against all affected platforms?

-

PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2061301421


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]

2024-04-17 Thread Yudi Zheng
On Tue, 26 Mar 2024 15:59:33 GMT, Damon Fenacci  wrote:

>> Yudi Zheng has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   address comment.
>
> `multiply_to_len` seems to be used by `generate_squareToLen` as well for 
> aarch64 and riscv but `zlen` is still passed in a register.
> 
> https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710
> https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881
> 
> I think it might work anyway but it might be better to adapt them if only for 
> completeness.

@dafedafe @dean-long please take a look and let me know if there are further 
issues, thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2061162283


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v5]

2024-04-17 Thread Yudi Zheng
> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

Yudi Zheng has updated the pull request incrementally with one additional 
commit since the last revision:

  address comment.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18226/files
  - new: https://git.openjdk.org/jdk/pull/18226/files/c5d521dc..72ba58ce

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=18226=04
 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=03-04

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18226.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226

PR: https://git.openjdk.org/jdk/pull/18226


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v4]

2024-04-17 Thread Yudi Zheng
> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

Yudi Zheng has updated the pull request incrementally with one additional 
commit since the last revision:

  address comment.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18226/files
  - new: https://git.openjdk.org/jdk/pull/18226/files/870a6127..c5d521dc

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=18226=03
 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=02-03

  Stats: 53 lines in 10 files changed: 3 ins; 6 del; 44 mod
  Patch: https://git.openjdk.org/jdk/pull/18226.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226

PR: https://git.openjdk.org/jdk/pull/18226


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]

2024-03-26 Thread Damon Fenacci
On Tue, 19 Mar 2024 21:09:31 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

`multiply_to_len` seems to be used by `generate_squareToLen` as well for 
aarch64 and riscv but `zlen` is still passed in a register.

https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4710
https://github.com/openjdk/jdk/blob/870a6127cf54264c691f7322d775b202705c3bfa/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2881

I think it might work anyway but it might be better to adapt them if only for 
completeness.

-

PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-1960906919


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v3]

2024-03-19 Thread Yudi Zheng
> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

Yudi Zheng has updated the pull request incrementally with one additional 
commit since the last revision:

  address comment.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18226/files
  - new: https://git.openjdk.org/jdk/pull/18226/files/bfc323b7..870a6127

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=18226=02
 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=01-02

  Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18226.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226

PR: https://git.openjdk.org/jdk/pull/18226


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]

2024-03-19 Thread Dean Long
On Tue, 19 Mar 2024 19:06:36 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 4670:

> 4668: const Register tmp6  = r15;
> 4669: const Register tmp7  = r16;
> 4670: const Register tmp8  = r17;

It looks like tmp8 is never used.  The call to multiply_to_len() below needs to 
be adjusted.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1530986811


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]

2024-03-19 Thread Dean Long
On Tue, 19 Mar 2024 19:06:36 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comment.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 3559:

> 3557:  Register tmp5, Register tmp6, 
> Register product_hi, Register tmp8) {
> 3558: 
> 3559:   assert_different_registers(x, xlen, y, ylen, z, tmp1, tmp2, tmp3, 
> tmp4, tmp5, tmp6, tmp8);

Suggestion:

  assert_different_registers(x, xlen, y, ylen, z, tmp1, tmp2, tmp3, tmp4, tmp5, 
tmp6, tmp8, product_hi);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1530980566


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]

2024-03-19 Thread Yudi Zheng
> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

Yudi Zheng has updated the pull request incrementally with one additional 
commit since the last revision:

  address comment.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18226/files
  - new: https://git.openjdk.org/jdk/pull/18226/files/37232a5f..bfc323b7

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=18226=01
 - incr: https://webrevs.openjdk.org/?repo=jdk=18226=00-01

  Stats: 70 lines in 11 files changed: 8 ins; 28 del; 34 mod
  Patch: https://git.openjdk.org/jdk/pull/18226.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226

PR: https://git.openjdk.org/jdk/pull/18226


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v2]

2024-03-19 Thread Yudi Zheng
On Mon, 18 Mar 2024 16:55:28 GMT, Damon Fenacci  wrote:

> Quite a simplification! Have you checked if there are any performance 
> differences?

Ran 
https://github.com/oracle/graal/blob/master/compiler/src/org.graalvm.micro.benchmarks/src/micro/benchmarks/BigIntegerBenchmark.java
The results are

$ before change
Benchmark Mode  CntScore   Error  Units
BigIntegerBenchmark.bigIntMontgomeryMul  thrpt5  122.488 ± 0.130  ops/s
BigIntegerBenchmark.bigIntMontgomerySqr  thrpt5   76.023 ± 0.106  ops/s
BigIntegerBenchmark.bigIntMulthrpt5  330.130 ± 0.349  ops/s
BigIntegerBenchmark.bigIntMulAdd thrpt5  455.590 ± 0.663  ops/s

$ after change
Benchmark Mode  CntScore   Error  Units
BigIntegerBenchmark.bigIntMontgomeryMul  thrpt5  124.407 ± 0.045  ops/s
BigIntegerBenchmark.bigIntMontgomerySqr  thrpt5   76.036 ± 0.232  ops/s
BigIntegerBenchmark.bigIntMulthrpt5  329.836 ± 0.953  ops/s
BigIntegerBenchmark.bigIntMulAdd thrpt5  456.485 ± 0.766  ops/s

-

PR Comment: https://git.openjdk.org/jdk/pull/18226#issuecomment-2007922439


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic

2024-03-18 Thread Dean Long
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng  wrote:

> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

src/hotspot/share/opto/library_call.cpp line 5934:

> 5932: // 'y_start' points to y array + scaled ylen
> 5933: 
> 5934: Node* zlen = _gvn.transform(new AddINode(xlen, ylen));

Would could generate one less instruction in the code cache if we did this 
`add` in the native runtime function.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1529102070


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic

2024-03-18 Thread Roger Riggs
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng  wrote:

> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

src/java.base/share/classes/java/math/BigInteger.java line 1836:

> 1834: 
> 1835: if (z == null || z.length < (xlen+ ylen))
> 1836:  z = new int[xlen+ylen];

Spaces before and after "+" please.  Tnx

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18226#discussion_r1528984661


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic

2024-03-18 Thread Damon Fenacci
On Tue, 12 Mar 2024 10:44:54 GMT, Yudi Zheng  wrote:

> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
> candidate to its caller simplifies the intrinsic implementation in JIT 
> compiler.

Quite a simplification!
Have you checked if there are any performance differences?

-

PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-1943670073


RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic

2024-03-12 Thread Yudi Zheng
Moving array construction within BigInteger.implMultiplyToLen intrinsic 
candidate to its caller simplifies the intrinsic implementation in JIT compiler.

-

Commit messages:
 - Simplify BigInteger.implMultiplyToLen intrinsic

Changes: https://git.openjdk.org/jdk/pull/18226/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk=18226=00
  Issue: https://bugs.openjdk.org/browse/JDK-8327964
  Stats: 53 lines in 2 files changed: 4 ins; 49 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/18226.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18226/head:pull/18226

PR: https://git.openjdk.org/jdk/pull/18226