Re: RFR: 8346239: Improve memory efficiency of JimageDiffGenerator

2024-12-19 Thread Fei Yang
On Thu, 19 Dec 2024 18:16:47 GMT, Severin Gehwolf  wrote:

> @MBaesken @RealFYang Could you please test this PR since you originally ran 
> into JDK-8346239 (this bug) and 
> [JDK-8344036](https://bugs.openjdk.org/browse/JDK-8344036). Thanks!

Yes, I can confirm that it works on my linux-aarch64 platform. Great! Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/22835#issuecomment-2556122458


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

2024-11-06 Thread Fei Yang
On Tue, 5 Nov 2024 00:18:17 GMT, Fei Yang  wrote:

>>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in 
>>> all of the lines (267, 271 and 273) should be reverted?
>>>
>> I think the previous lines are okay because we are constructing the fp, 
>> whereas in here we want to read the old fp stored in this frame.
>
>> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I 
>> suggested to change the literal 2 into `frame::sender_sp_offset` in order to 
>> increase the readability, but I forgot that `frame::sender_sp_offset` is 0 
>> on riscv64. However I do think it's a problem that several places throughout 
>> the code base uses a literal 2 when it should really be 
>> `frame::sender_sp_offset`. This type of code is very fiddly and I think we 
>> should do what we can to increase the readability, so maybe we need another 
>> `frame::XYZ` constant that is 2 for this case.
> 
> Yeah, I was also considering this issue when we were porting loom. I guess 
> maybe `frame::metadata_words` which equals 2. Since this is not the only 
> place, I would suggest we do a separate cleanup PR. 
> 
>> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in 
>> all of the lines (267, 271 and 273) should be reverted?
> 
> I agree with @pchilano in that we are fine with these places.

> Sorry, I also thought it matched the aarch64 one without checking. @RealFYang 
> should I change it for `hf.sp() + frame::link_offset` or just leave it as it 
> was?

Or maybe `hf.sp() - frame::metadata_words`. But since we have several other 
occurrences, I would suggest we leave it as it was and go with a separate PR 
for the cleanup.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828566395


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

2024-11-06 Thread Fei Yang
On Mon, 4 Nov 2024 18:23:23 GMT, Patricio Chilano Mateo 
 wrote:

>> Sorry, I also thought it matched the aarch64 one without checking. 
>> @RealFYang should I change it for `hf.sp() + frame::link_offset` or just 
>> leave it as it was?
>
>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in 
>> all of the lines (267, 271 and 273) should be reverted?
>>
> I think the previous lines are okay because we are constructing the fp, 
> whereas in here we want to read the old fp stored in this frame.

> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I 
> suggested to change the literal 2 into `frame::sender_sp_offset` in order to 
> increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on 
> riscv64. However I do think it's a problem that several places throughout the 
> code base uses a literal 2 when it should really be 
> `frame::sender_sp_offset`. This type of code is very fiddly and I think we 
> should do what we can to increase the readability, so maybe we need another 
> `frame::XYZ` constant that is 2 for this case.

Yeah, I was also considering this issue when we were porting loom. I guess 
maybe `frame::metadata_words` which equals 2. Since this is not the only place, 
I would suggest we do a separate cleanup PR. 

> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in 
> all of the lines (267, 271 and 273) should be reverted?

I agree with @pchilano in that we are fine with these places.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828563437


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

2024-11-06 Thread Fei Yang
On Thu, 31 Oct 2024 20:02:31 GMT, Patricio Chilano Mateo 
 wrote:

>> src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273:
>> 
>>> 271: ? frame_sp + fsize - frame::sender_sp_offset
>>> 272: // we need to re-read fp because it may be an oop and we might 
>>> have fixed the frame.
>>> 273: : *(intptr_t**)(hf.sp() - 2);
>> 
>> Suggestion:
>> 
>> : *(intptr_t**)(hf.sp() - frame::sender_sp_offset);
>
> Changed.

Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which 
is different from aarch64 or x86-64. So I think we should revert this change: 
https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd.
 @pchilano : Could you please help do that? 

(PS: `hotspot_loom & jdk_loom` still test good with latest version after 
locally reverting 
https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826453713


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning

2024-11-06 Thread Fei Yang
On Mon, 4 Nov 2024 18:18:38 GMT, Patricio Chilano Mateo 
 wrote:

>> Here's my suggested C2 change:
>> 
>> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad 
>> b/src/hotspot/cpu/aarch64/aarch64.ad
>> index d9c77a2f529..1e99db191ae 100644
>> --- a/src/hotspot/cpu/aarch64/aarch64.ad
>> +++ b/src/hotspot/cpu/aarch64/aarch64.ad
>> @@ -3692,14 +3692,13 @@ encode %{
>>__ post_call_nop();
>>  } else {
>>Label retaddr;
>> +  // Make the anchor frame walkable
>>__ adr(rscratch2, retaddr);
>> +  __ str(rscratch2, Address(rthread, 
>> JavaThread::last_Java_pc_offset()));
>>__ lea(rscratch1, RuntimeAddress(entry));
>> -  // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc()
>> -  __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize)));
>>__ blr(rscratch1);
>>__ bind(retaddr);
>>__ post_call_nop();
>> -  __ add(sp, sp, 2 * wordSize);
>>  }
>>  if (Compile::current()->max_vector_size() > 0) {
>>__ reinitialize_ptrue();
>
> Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code.
> @RealFYang I made the equivalent change for riscv, could you verify it's okay?

@pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup!
`hotspot_loom jdk_loom` still test good with both release and fastdebug builds 
on linux-riscv64 platform.

BTW: I noticed one more return miss prediction case which I think was 
previously missed in 
https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2
Do you mind adding following small addon change to fix it? Thanks.

diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp 
b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp
index 84a292242c3..ac28f4b3514 100644
--- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp
+++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp
@@ -1263,10 +1263,10 @@ address 
TemplateInterpreterGenerator::generate_native_entry(bool synchronized) {
   if (LockingMode != LM_LEGACY) {
 // Check preemption for Object.wait()
 Label not_preempted;
-__ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset()));
-__ beqz(t0, not_preempted);
+__ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset()));
+__ beqz(t1, not_preempted);
 __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset()));
-__ jr(t0);
+__ jr(t1);
 __ bind(native_return);
 __ restore_after_resume(true /* is_native */);
 // reload result_handler

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828797495


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12]

2024-11-04 Thread Fei Yang
On Mon, 4 Nov 2024 18:18:38 GMT, Patricio Chilano Mateo 
 wrote:

>> Here's my suggested C2 change:
>> 
>> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad 
>> b/src/hotspot/cpu/aarch64/aarch64.ad
>> index d9c77a2f529..1e99db191ae 100644
>> --- a/src/hotspot/cpu/aarch64/aarch64.ad
>> +++ b/src/hotspot/cpu/aarch64/aarch64.ad
>> @@ -3692,14 +3692,13 @@ encode %{
>>__ post_call_nop();
>>  } else {
>>Label retaddr;
>> +  // Make the anchor frame walkable
>>__ adr(rscratch2, retaddr);
>> +  __ str(rscratch2, Address(rthread, 
>> JavaThread::last_Java_pc_offset()));
>>__ lea(rscratch1, RuntimeAddress(entry));
>> -  // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc()
>> -  __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize)));
>>__ blr(rscratch1);
>>__ bind(retaddr);
>>__ post_call_nop();
>> -  __ add(sp, sp, 2 * wordSize);
>>  }
>>  if (Compile::current()->max_vector_size() > 0) {
>>__ reinitialize_ptrue();
>
> Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code.
> @RealFYang I made the equivalent change for riscv, could you verify it's okay?

@pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup!
`hotspot_loom jdk_loom` still test good with both release and fastdebug builds 
on linux-riscv64 platform.

BTW: I noticed one more return miss prediction case which I think was 
previously missed in 
https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2
Do you mind adding following small addon change to fix it? Thanks.

diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp 
b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp
index 84a292242c3..ac28f4b3514 100644
--- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp
+++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp
@@ -1263,10 +1263,10 @@ address 
TemplateInterpreterGenerator::generate_native_entry(bool synchronized) {
   if (LockingMode != LM_LEGACY) {
 // Check preemption for Object.wait()
 Label not_preempted;
-__ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset()));
-__ beqz(t0, not_preempted);
+__ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset()));
+__ beqz(t1, not_preempted);
 __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset()));
-__ jr(t0);
+__ jr(t1);
 __ bind(native_return);
 __ restore_after_resume(true /* is_native */);
 // reload result_handler

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828797495


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9]

2024-11-04 Thread Fei Yang
On Tue, 5 Nov 2024 00:18:17 GMT, Fei Yang  wrote:

>>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in 
>>> all of the lines (267, 271 and 273) should be reverted?
>>>
>> I think the previous lines are okay because we are constructing the fp, 
>> whereas in here we want to read the old fp stored in this frame.
>
>> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I 
>> suggested to change the literal 2 into `frame::sender_sp_offset` in order to 
>> increase the readability, but I forgot that `frame::sender_sp_offset` is 0 
>> on riscv64. However I do think it's a problem that several places throughout 
>> the code base uses a literal 2 when it should really be 
>> `frame::sender_sp_offset`. This type of code is very fiddly and I think we 
>> should do what we can to increase the readability, so maybe we need another 
>> `frame::XYZ` constant that is 2 for this case.
> 
> Yeah, I was also considering this issue when we were porting loom. I guess 
> maybe `frame::metadata_words` which equals 2. Since this is not the only 
> place, I would suggest we do a separate cleanup PR. 
> 
>> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in 
>> all of the lines (267, 271 and 273) should be reverted?
> 
> I agree with @pchilano in that we are fine with these places.

> Sorry, I also thought it matched the aarch64 one without checking. @RealFYang 
> should I change it for `hf.sp() + frame::link_offset` or just leave it as it 
> was?

Or maybe `hf.sp() - frame::metadata_words`. But since we have several other 
occurrences, I would suggest we leave it as it was and go with a separate PR 
for the cleanup.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828566395


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9]

2024-11-04 Thread Fei Yang
On Mon, 4 Nov 2024 18:23:23 GMT, Patricio Chilano Mateo 
 wrote:

>> Sorry, I also thought it matched the aarch64 one without checking. 
>> @RealFYang should I change it for `hf.sp() + frame::link_offset` or just 
>> leave it as it was?
>
>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in 
>> all of the lines (267, 271 and 273) should be reverted?
>>
> I think the previous lines are okay because we are constructing the fp, 
> whereas in here we want to read the old fp stored in this frame.

> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I 
> suggested to change the literal 2 into `frame::sender_sp_offset` in order to 
> increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on 
> riscv64. However I do think it's a problem that several places throughout the 
> code base uses a literal 2 when it should really be 
> `frame::sender_sp_offset`. This type of code is very fiddly and I think we 
> should do what we can to increase the readability, so maybe we need another 
> `frame::XYZ` constant that is 2 for this case.

Yeah, I was also considering this issue when we were porting loom. I guess 
maybe `frame::metadata_words` which equals 2. Since this is not the only place, 
I would suggest we do a separate cleanup PR. 

> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in 
> all of the lines (267, 271 and 273) should be reverted?

I agree with @pchilano in that we are fine with these places.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828563437


Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9]

2024-11-01 Thread Fei Yang
On Thu, 31 Oct 2024 20:02:31 GMT, Patricio Chilano Mateo 
 wrote:

>> src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273:
>> 
>>> 271: ? frame_sp + fsize - frame::sender_sp_offset
>>> 272: // we need to re-read fp because it may be an oop and we might 
>>> have fixed the frame.
>>> 273: : *(intptr_t**)(hf.sp() - 2);
>> 
>> Suggestion:
>> 
>> : *(intptr_t**)(hf.sp() - frame::sender_sp_offset);
>
> Changed.

Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which 
is different from aarch64 or x86-64. So I think we should revert this change: 
https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd.
 @pchilano : Could you please help do that?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826453713


Re: RFR: 8337753: Target class of upcall stub may be unloaded

2024-09-04 Thread Fei Yang
On Wed, 4 Sep 2024 11:58:49 GMT, Jorn Vernee  wrote:

> Were you able to reproduce the original issue on RISC-V?

Yeah. I can reproduce similar crash on linux-riscv64 platform running this new 
test as well.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743722070


Re: RFR: 8337753: Target class of upcall stub may be unloaded

2024-09-03 Thread Fei Yang
On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee  wrote:

> As discussed in the JBS issue:
> 
> FFM upcall stubs embed a `Method*` of the target method in the stub. This 
> `Method*` is read from the `LambdaForm::vmentry` field associated with the 
> target method handle at the time when the upcall stub is generated. The MH 
> instance itself is stashed in a global JNI ref. So, there should be a 
> reachability chain to the holder class: `MH (receiver) -> LF (form) -> 
> MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)`
> 
> However, it appears that, due to multiple threads racing to initialize the 
> `vmentry` field of the `LambdaForm` of the target method handle of an upcall 
> stub, it is possible that the `vmentry` is updated _after_ we embed the 
> corresponding `Method`* into an upcall stub (or rather, the latest update is 
> not visible to the thread generating the upcall stub). Technically, it is 
> fine to keep using a 'stale' `vmentry`, but the problem is that now the 
> reachability chain is broken, since the upcall stub only extracts the target 
> `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class 
> can then be unloaded, resulting in a crash.
> 
> The fix I've chosen for this is to mimic what we already do in 
> `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from 
> the target method handle each time. Luckily, this does not really seem to 
> impact performance.
> 
> 
> Performance numbers
> x64:
> 
> before:
> 
> Benchmark Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  69.216 ± 1.791  ns/op
> 
> 
> after:
> 
> Benchmark Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  67.787 ± 0.684  ns/op
> 
> 
> aarch64:
> 
> before:
> 
> Benchmark Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  61.574 ± 0.801  ns/op
> 
> 
> after:
> 
> Benchmark Mode  Cnt   Score   Error  Units
> Upcalls.panama_blank  avgt   30  61.218 ± 0.554  ns/op
> 
> 
> 
> As for the added TestUpcallStress test, it takes about 800 seconds to run 
> this test on the dev machine I'm using, so I've set the timeout quite high. 
> Since it runs for so long, I've dropped it from the default `jdk_foreign` 
> test suite, which runs in tier2. Instead the new test will run in tier4.
> 
> Testing: tier 1-4

src/hotspot/cpu/riscv/upcallLinker_riscv.cpp line 264:

> 262: 
> 263:   __ block_comment("{ load target ");
> 264:   __ movptr(j_rarg0, (intptr_t) receiver);

Hi @JornVernee , Could you please apply following small add-on change for 
linux-riscv64? As I witnessed build warning with GCC-13. Otherwise, builds fine 
and the newly-added test/jdk/java/foreign/TestUpcallStress.java is passing.


diff --git a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp 
b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp
index 5c45a679316..55160be99d0 100644
--- a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp
+++ b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp
@@ -261,7 +261,7 @@ address UpcallLinker::make_upcall_stub(jobject receiver, 
Symbol* signature,
   __ block_comment("} argument shuffle");

   __ block_comment("{ load target ");
-  __ movptr(j_rarg0, (intptr_t) receiver);
+  __ movptr(j_rarg0, (address) receiver);
   __ far_call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // 
loads Method* into xmethod
   __ block_comment("} load target ");

diff --git a/test/jdk/java/foreign/TestUpcallStress.java 
b/test/jdk/java/foreign/TestUpcallStress.java
index 3b9b1d4b207..40607746856 100644
--- a/test/jdk/java/foreign/TestUpcallStress.java
+++ b/test/jdk/java/foreign/TestUpcallStress.java
@@ -24,7 +24,7 @@
 /*
  * @test
  * @requires jdk.foreign.linker != "FALLBACK"
- * @requires os.arch == "aarch64" & os.name == "Linux"
+ * @requires (os.arch == "aarch64" | os.arch=="riscv64") & os.name == "Linux"
  * @requires os.maxMemory > 4G
  * @modules java.base/jdk.internal.foreign
  * @build NativeTestHelper CallGeneratorHelper TestUpcallBase

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743130094


Re: RFR: 8334397: RISC-V: verify perf of ReverseBytesS/US

2024-06-23 Thread Fei Yang
On Fri, 21 Jun 2024 14:24:26 GMT, Hamlin Li  wrote:

> Hi,
> Can you help to review this patch?
> Thanks!
> 
> This is similar with previous JDK-8334396.
> Added some tests.
> 
> ### Test
> 
>   | Tests | Scores | Errors | Unit
> -- | -- | -- | -- | --
> Intrinsic, +zbb, +rvv | Characters.reverseBytes | 1654.535 | 69.36 | ns/op
>   | Shorts.reverseBytes | 1795.403 | 44.015 | ns/op
> Intrinsic, +zbb, -rvv | Characters.reverseBytes | 1649.752 | 74.965 | ns/op
>   | Shorts.reverseBytes | 1798.637 | 49.52 | ns/op
> Intrinsic, -zbb, +rvv | Characters.reverseBytes | 2279.588 | 44.222 | ns/op
>   | Shorts.reverseBytes | 2441.674 | 63.895 | ns/op
> Intrinsic, -zbb, -rvv | Characters.reverseBytes | 2288.876 | 49.099 | ns/op
>   | Shorts.reverseBytes | 2454.454 | 94.004 | ns/op
> No intrinsic | Characters.reverseBytes | 1629.722 | 23.656 | ns/op
>   | Shorts.reverseBytes | 2108.81 | 43.378 | ns/op
> 
> 

LGTM. Thanks.

-

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19830#pullrequestreview-2134513429


Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6]

2024-05-24 Thread Fei Yang
On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng  wrote:

>> Moving array construction within BigInteger.implMultiplyToLen intrinsic 
>> candidate to its caller simplifies the intrinsic implementation in JIT 
>> compiler.
>
> Yudi Zheng has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   address comments.

Hi, RISC-V part of change seems fine. "java/math/BigInteger" test result is 
clean on linux-riscv64 platform. Thanks for the ping.

-

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-2076019194


Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4]

2023-10-19 Thread Fei Yang
On Thu, 19 Oct 2023 12:14:52 GMT, Ilya Gavrilin  wrote:

>> Hi all, please review this changes into risc-v floating point copysign and 
>> signum intrinsics.
>> CopySign - returns first argument with the sign of second. On risc-v we have 
>> `fsgnj.x` instruction, which can implement this intrinsic.
>> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the 
>> sign of input value returned.  On risc-v we can use `fclass.x` to specify 
>> type of input value and return appropriate value.
>> 
>> Tests:
>> Performance tests on t-head board:
>> With intrinsics:
>> 
>> Benchmark (seed)   Mode  Cnt  Score  Error   Units
>> MathBench.copySignDouble   0  thrpt8  34156.580 ±   76.272  ops/ms
>> MathBench.copySignFloat0  thrpt8  34181.731 ±   38.182  ops/ms
>> MathBench.signumDouble 0  thrpt8  31977.258 ± 1122.327  ops/ms
>> MathBench.signumFloat  0  thrpt8  31836.852 ±   56.013  ops/ms
>> 
>> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions 
>> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`):
>> 
>> Benchmark (seed)   Mode  Cnt  Score  Error   Units
>> MathBench.copySignDouble   0  thrpt8  31000.996 ±  943.094  ops/ms
>> MathBench.copySignFloat0  thrpt8  30678.016 ±   28.087  ops/ms
>> MathBench.signumDouble 0  thrpt8  25435.010 ± 2047.085  ops/ms
>> MathBench.signumFloat  0  thrpt8  25257.058 ±   79.175  ops/ms
>> 
>> Regression tests: tier1, hotspot:tier2 on risc-v board.
>> 
>> Also, changed name of one micro test: before we had: `sigNumDouble` and 
>> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we 
>> have similar part: `signum`.
>> Performance tests has been changed a bit, to check intrinsics result better, 
>> diff to modify tests:
>> 
>> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java 
>> b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> index 6cd1353907e..0bee25366bf 100644
>> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> @@ -143,12 +143,12 @@ public double  ceilDouble() {
>>  
>>  @Benchmark
>>  public double  copySignDouble() {
>> -return  Math.copySign(double81, doubleNegative12);
>> +return  Math.copySign(double81, doubleNegative12) + 
>> Math.copySign(double81, double2) + Math.copySign(double4Dot1, 
>> doubleNegative12);
>>  }
>>  
>>  @Benchmark
>>  public float  copySignFloat() {
>> -return  Math.copySign(floatNegative99, float1);
>> +return  ...
>
> Ilya Gavrilin has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Changed branch inside signum implementation

Still good. You might want to correct the remaining typo.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1660:

> 1658: // otherwise return +/- 1.0 using sign of input.
> 1659: // one - gives us a floating-point 1.0 (got from matching rule)
> 1660: // bool is_double - specififes single or double precision operations 
> will be used.

Suggestion: s/specififes/specifies/

-

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1687549814
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1365427533


Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v3]

2023-10-18 Thread Fei Yang
On Wed, 18 Oct 2023 17:35:58 GMT, Ilya Gavrilin  wrote:

>> Hi all, please review this changes into risc-v floating point copysign and 
>> signum intrinsics.
>> CopySign - returns first argument with the sign of second. On risc-v we have 
>> `fsgnj.x` instruction, which can implement this intrinsic.
>> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the 
>> sign of input value returned.  On risc-v we can use `fclass.x` to specify 
>> type of input value and return appropriate value.
>> 
>> Tests:
>> Performance tests on t-head board:
>> With intrinsics:
>> 
>> Benchmark (seed)   Mode  Cnt  Score  Error   Units
>> MathBench.copySignDouble   0  thrpt8  34156.580 ±   76.272  ops/ms
>> MathBench.copySignFloat0  thrpt8  34181.731 ±   38.182  ops/ms
>> MathBench.signumDouble 0  thrpt8  31977.258 ± 1122.327  ops/ms
>> MathBench.signumFloat  0  thrpt8  31836.852 ±   56.013  ops/ms
>> 
>> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions 
>> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`):
>> 
>> Benchmark (seed)   Mode  Cnt  Score  Error   Units
>> MathBench.copySignDouble   0  thrpt8  31000.996 ±  943.094  ops/ms
>> MathBench.copySignFloat0  thrpt8  30678.016 ±   28.087  ops/ms
>> MathBench.signumDouble 0  thrpt8  25435.010 ± 2047.085  ops/ms
>> MathBench.signumFloat  0  thrpt8  25257.058 ±   79.175  ops/ms
>> 
>> Regression tests: tier1, hotspot:tier2 on risc-v board.
>> 
>> Also, changed name of one micro test: before we had: `sigNumDouble` and 
>> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we 
>> have similar part: `signum`.
>> Performance tests has been changed a bit, to check intrinsics result better, 
>> diff to modify tests:
>> 
>> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java 
>> b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> index 6cd1353907e..0bee25366bf 100644
>> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> @@ -143,12 +143,12 @@ public double  ceilDouble() {
>>  
>>  @Benchmark
>>  public double  copySignDouble() {
>> -return  Math.copySign(double81, doubleNegative12);
>> +return  Math.copySign(double81, doubleNegative12) + 
>> Math.copySign(double81, double2) + Math.copySign(double4Dot1, 
>> doubleNegative12);
>>  }
>>  
>>  @Benchmark
>>  public float  copySignFloat() {
>> -return  Math.copySign(floatNegative99, float1);
>> +return  ...
>
> Ilya Gavrilin has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove some effects and assert

Updated change looks good. Thanks.
JMH data on hifive unmatched for reference:

Before:
Benchmark (seed)   Mode  Cnt  Score   Error   Units
MathBench.copySignDouble   0  thrpt8  79728.042 ?  8211.438  ops/ms
MathBench.copySignFloat0  thrpt8  79516.930 ? 13163.477  ops/ms
MathBench.sigNumDouble 0  thrpt8  58204.403 ?  6795.238  ops/ms
MathBench.signumFloat  0  thrpt8  57882.056 ?  3635.354  ops/ms

After:
Benchmark (seed)   Mode  Cnt   Score   Error   Units
MathBench.copySignDouble   0  thrpt8  104301.832 ?  7170.917  ops/ms
MathBench.copySignFloat0  thrpt8  103008.851 ? 11722.187  ops/ms
MathBench.signumDouble 0  thrpt8   64465.030 ?  6849.148  ops/ms
MathBench.signumFloat  0  thrpt8   63987.290 ?  4298.311  ops/ms

-

Marked as reviewed by fyang (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1686455232


Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2]

2023-10-18 Thread Fei Yang
On Tue, 17 Oct 2023 09:52:15 GMT, Ilya Gavrilin  wrote:

>> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682:
>> 
>>> 1680:   // use floating-point 1.0 with a sign of input
>>> 1681:   is_double ? fsgnj_d(dst, one, src)
>>> 1682: : fsgnj_s(dst, one, src);
>> 
>> What if the `src` argument contains zero? Math.signum(float/double) is 
>> supposed to return zero if the argument is zero [1].
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602
>
> According to IEEE754, we can get positive or negative zero in the `src` 
> register (also positive zero can be named as zero) , and these cases included 
> to mask for the tmp1 (L1671-1676) and `src` value returned.

I see. Thanks for the answer. I can approve this once my other comments are 
resolved.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1363440282


Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2]

2023-10-17 Thread Fei Yang
On Mon, 16 Oct 2023 21:14:39 GMT, Ilya Gavrilin  wrote:

>> Hi all, please review this changes into risc-v floating point copysign and 
>> signum intrinsics.
>> CopySign - returns first argument with the sign of second. On risc-v we have 
>> `fsgnj.x` instruction, which can implement this intrinsic.
>> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the 
>> sign of input value returned.  On risc-v we can use `fclass.x` to specify 
>> type of input value and return appropriate value.
>> 
>> Tests:
>> Performance tests on t-head board:
>> With intrinsics:
>> 
>> Benchmark (seed)   Mode  Cnt  Score  Error   Units
>> MathBench.copySignDouble   0  thrpt8  34156.580 ±   76.272  ops/ms
>> MathBench.copySignFloat0  thrpt8  34181.731 ±   38.182  ops/ms
>> MathBench.signumDouble 0  thrpt8  31977.258 ± 1122.327  ops/ms
>> MathBench.signumFloat  0  thrpt8  31836.852 ±   56.013  ops/ms
>> 
>> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions 
>> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`):
>> 
>> Benchmark (seed)   Mode  Cnt  Score  Error   Units
>> MathBench.copySignDouble   0  thrpt8  31000.996 ±  943.094  ops/ms
>> MathBench.copySignFloat0  thrpt8  30678.016 ±   28.087  ops/ms
>> MathBench.signumDouble 0  thrpt8  25435.010 ± 2047.085  ops/ms
>> MathBench.signumFloat  0  thrpt8  25257.058 ±   79.175  ops/ms
>> 
>> Regression tests: tier1, hotspot:tier2 on risc-v board.
>> 
>> Also, changed name of one micro test: before we had: `sigNumDouble` and 
>> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we 
>> have similar part: `signum`.
>> Performance tests has been changed a bit, to check intrinsics result better, 
>> diff to modify tests:
>> 
>> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java 
>> b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> index 6cd1353907e..0bee25366bf 100644
>> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
>> @@ -143,12 +143,12 @@ public double  ceilDouble() {
>>  
>>  @Benchmark
>>  public double  copySignDouble() {
>> -return  Math.copySign(double81, doubleNegative12);
>> +return  Math.copySign(double81, doubleNegative12) + 
>> Math.copySign(double81, double2) + Math.copySign(double4Dot1, 
>> doubleNegative12);
>>  }
>>  
>>  @Benchmark
>>  public float  copySignFloat() {
>> -return  Math.copySign(floatNegative99, float1);
>> +return  ...
>
> Ilya Gavrilin has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fix some registers usages and typos

Changes requested by fyang (Reviewer).

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659:

> 1657: // on input we have NaN or +/-0.0 value we should return it,
> 1658: // otherwise return +/- 1.0 using sign of input.
> 1659: // tmp1 - alias for t0 register,

Maybe remove this line (L1659) of the comment?

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663:

> 1661: // bool is_double - specififes single or double precision operations 
> will be used.
> 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, 
> FloatRegister one, bool is_double) {
> 1663:   assert_different_registers(dst, src, one);

Any reason to keep the assertion?

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682:

> 1680:   // use floating-point 1.0 with a sign of input
> 1681:   is_double ? fsgnj_d(dst, one, src)
> 1682: : fsgnj_s(dst, one, src);

What if the `src` argument contains zero? Math.signum(float/double) is supposed 
to return zero if the argument is zero [1].

[1] 
https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602

src/hotspot/cpu/riscv/riscv.ad line 7537:

> 7535: instruct signumD_reg(fRegD dst, fRegD src, immD zero, fRegD one) %{
> 7536:   match(Set dst (SignumD src (Binary zero one)));
> 7537:   effect(TEMP_DEF dst, USE src, USE one);

Any reason to keep this effect?

src/hotspot/cpu/riscv/riscv.ad line 7548:

> 7546: instruct signumF_reg(fRegF dst, fRegF src, immF zero, fRegF one) %{
> 7547:   match(Set dst (SignumF src (Binary zero one)));
> 7548:   effect(TEMP_DEF dst, USE src, USE one);

Any reason to keep this effect?

-

PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1681791129
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361790804
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361788992
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361782859
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361793207
PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361793356


Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics

2023-10-16 Thread Fei Yang
On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin  wrote:

> Hi all, please review this changes into risc-v floating point copysign and 
> signum intrinsics.
> CopySign - returns first argument with the sign of second. On risc-v we have 
> `fsgnj.x` instruction, which can implement this intrinsic.
> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the 
> sign of input value returned.  On risc-v we can use `fclass.x` to specify 
> type of input value and return appropriate value.
> 
> Tests:
> Performance tests on t-head board:
> With intrinsics:
> 
> Benchmark (seed)   Mode  Cnt  Score  Error   Units
> MathBench.copySignDouble   0  thrpt8  34156.580 ±   76.272  ops/ms
> MathBench.copySignFloat0  thrpt8  34181.731 ±   38.182  ops/ms
> MathBench.signumDouble 0  thrpt8  31977.258 ± 1122.327  ops/ms
> MathBench.signumFloat  0  thrpt8  31836.852 ±   56.013  ops/ms
> 
> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions 
> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`):
> 
> Benchmark (seed)   Mode  Cnt  Score  Error   Units
> MathBench.copySignDouble   0  thrpt8  31000.996 ±  943.094  ops/ms
> MathBench.copySignFloat0  thrpt8  30678.016 ±   28.087  ops/ms
> MathBench.signumDouble 0  thrpt8  25435.010 ± 2047.085  ops/ms
> MathBench.signumFloat  0  thrpt8  25257.058 ±   79.175  ops/ms
> 
> Regression tests: tier1, hotspot:tier2 on risc-v board.
> 
> Also, changed name of one micro test: before we had: `sigNumDouble` and 
> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we 
> have similar part: `signum`.
> Performance tests has been changed a bit, to check intrinsics result better, 
> diff to modify tests:
> 
> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java 
> b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> index 6cd1353907e..0bee25366bf 100644
> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java
> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java
> @@ -143,12 +143,12 @@ public double  ceilDouble() {
>  
>  @Benchmark
>  public double  copySignDouble() {
> -return  Math.copySign(double81, doubleNegative12);
> +return  Math.copySign(double81, doubleNegative12) + 
> Math.copySign(double81, double2) + Math.copySign(double4Dot1, 
> doubleNegative12);
>  }
>  
>  @Benchmark
>  public float  copySignFloat() {
> -return  Math.copySign(floatNegative99, float1);
> +return  Math.copySign(floatNegative99, float1) + 
> Math.copySign(eFloat, float1) + Math.copySign...

Some comments after a brief look.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659:

> 1657: // on input we have NaN or +/-0.0 value we should return it,
> 1658: // otherwise return +/- 1.0 using sign of input.
> 1659: // tmp1 - used to store result of fclass operation,

Seems to me that scratch register `t0` could be used in this function in order 
to save use of `tmp1`.

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1661:

> 1659: // tmp1 - used to store result of fclass operation,
> 1660: // one - gives us a floating-point 1.0 (got from matching rule)
> 1661: // bool single_precision - specififes single or double precision 
> operations will be used.

Suggestion: s/bool single_precision - specififes/bool is_double - specifies/

src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663:

> 1661: // bool single_precision - specififes single or double precision 
> operations will be used.
> 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, 
> FloatRegister one, Register tmp1, bool is_double) {
> 1663:   assert_different_registers(dst, src, one);

This constraint could be relaxed if we use scratch register `t0` here instead 
of `tmp`.

src/hotspot/cpu/riscv/riscv.ad line 7511:

> 7509: // Copysign and signum intrinsics
> 7510: 
> 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{

A more simpler `immD zero` will do here which is similar with the x86 
counterpart.

src/hotspot/cpu/riscv/riscv.ad line 7513:

> 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{
> 7512:   match(Set dst (CopySignD src1 (Binary src2 zero)));
> 7513:   effect(TEMP_DEF dst, USE src1, USE src2);

Unnecessary effect.

src/hotspot/cpu/riscv/riscv.ad line 7526:

> 7524: instruct copySignF_reg(fRegF dst, fRegF src1, fRegF src2) %{
> 7525:   match(Set dst (CopySignF src1 src2));
> 7526:   effect(TEMP_DEF dst, USE src1, USE src2);

Unnecessary effect.

src/hotspot/cpu/riscv/riscv.ad line 7537:

> 7535: %}
> 7536: 
> 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, 
> iRegINoSp tmp1) %{

Use `immD zero` instread, which will help avoid reserving one FP register here.

src/hotspot/cpu/riscv/riscv.ad line 7539:

> 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one

Re: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5]

2023-04-26 Thread Fei Yang
On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg  wrote:

>> Per Minborg has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add example for Option::captureStateLayout
>
> A review of all the copyright years shall be made in this PR.

@minborg : Hello, I think you should list Feilong Jiang (OpenJDK ID: fjiang) as 
co-author instead. It is him who added code for the riscv port.

-

PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1523097913


Re: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5]

2023-04-06 Thread Fei Yang
On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg  wrote:

>> Per Minborg has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Add example for Option::captureStateLayout
>
> A review of all the copyright years shall be made in this PR.

> Hi @minborg, looks like some changes were missed on riscv port. I've added 
> these changes and submitted tests on linux-riscv. `jdk_foreign` still passed 
> with release & fatdebug build. Could you please add these extra changes for 
> riscv? Thanks. Here is the patch: 
> [foreign_riscv_port_patch.txt](https://github.com/openjdk/jdk/files/11037700/foreign_riscv_port_patch.txt)

@feilongjiang : Hello, the riscv-specific changes looks good to me. Thanks for 
the update.

-

PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1499868465


Re: RFR: 8303863: RISC-V: TestArrayStructs.java fails after JDK-8303604

2023-03-10 Thread Fei Yang
On Thu, 9 Mar 2023 14:39:55 GMT, Feilong Jiang  wrote:

> [JDK-8303604](https://bugs.openjdk.org/browse/JDK-8303604) fixes an issue 
> with passing structs whose size is not a power of two on SysV and AArch64 
> platforms. The same issue happens on RISC-V.
> 
> Modifications are unnecessary for `STRUCT_REGISTER_F` and 
> `STRUCT_REGISTER_XF`. If there are no available registers, they will fall 
> back to `STRUCT_REGISTER_X`.
> 
> Testing:
> 
> - [x] TestArrayStructs.java 
> - [x] jdk_foreign on Unmatched Board (release build)

Looks fine to me.

-

Marked as reviewed by fyang (Reviewer).

PR: https://git.openjdk.org/jdk/pull/12950


Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v3]

2023-03-07 Thread Fei Yang
On Tue, 7 Mar 2023 21:38:38 GMT, Vladimir Kozlov  wrote:

>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in 
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on 
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java 
>> methods were implemented originally.
>> 
>> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls 
>> to runtime stubs which use the same HW instructions as C2 intrinsics. Only 
>> for 64-bit x64 because 32-bit x86 stub does not work: result is passed 
>> through FPU register and NaN values become different from C2 intrinsic. This 
>> runtime stub is only used to calculate constant values during C2 compilation 
>> and can be skipped.
>> 
>> I added new tests based on Tobias's `TestAll.java` And copied 
>> `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to 
>> make sure code is compiled by C1 or C2. I modified 
>> `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2.
>> 
>> Tested tier1-5, Xcomp, stress
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Do not allow JIT compilation of Float.float16ToFloat and 
> Float.floatToFloat16

Hi, Thanks for handling linux-riscv64 at the same time.
Bad news is that we witnessed test failures when running following test with 
QEMU (no riscv hardware available with Zfhmin extension for now): 
test/hotspot/jtreg/compiler/intrinsics/float16/TestAllFloat16ToFloat.java


Exception in thread "main" java.lang.RuntimeException: Inconsistent result for 
Float.floatToFloat16(NaN/7fc0): 7e00 != fc01
at TestAllFloat16ToFloat.verify(TestAllFloat16ToFloat.java:62)
at TestAllFloat16ToFloat.run(TestAllFloat16ToFloat.java:72)
at TestAllFloat16ToFloat.main(TestAllFloat16ToFloat.java:94)


It looks like there is a problem when handling NaNs with fcvt.h.s/fmv.x.h and 
fmv.h.x/fcvt.s.h instructions at the bottom.
It's also possible to be an issue of QEMU as well. It would take quite a while 
to diagnose. But I don't want this to block this PR.
So I would prefer removing support of this feature for this port and adding 
back once this is resolved. And I have prepared a patch for that purpose. See 
attachement.
[12869-revert-riscv.txt](https://github.com/openjdk/jdk/files/10915606/12869-revert-riscv.txt)

-

PR: https://git.openjdk.org/jdk/pull/12869


Re: RFR: 8293841: RISC-V: Implementation of Foreign Function & Memory API (Preview) [v7]

2023-01-15 Thread Fei Yang
On Mon, 16 Jan 2023 06:38:35 GMT, Feilong Jiang  wrote:

>> Add experimental Foreign Function & Memory API support for RISC-V.
>> 
>> For details of the FFM API RISC-V port please refer to [JBS 
>> issue](https://bugs.openjdk.org/browse/JDK-8293841)
>> 
>> Testing:
>> 
>> - [x] jdk_foreign with release/fastdebug build on linux-riscv64
>> - [x] jdk_foreign with release/fastdebug build on linux-x64
>> - [x] jdk_foreign with release/fastdebug build on linux-aarch64
>
> Feilong Jiang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more code style adjustment

Looks good to me. This also passed tier1-4 tests on my HiFive Unmatched boards.

-

Marked as reviewed by fyang (Reviewer).

PR: https://git.openjdk.org/jdk/pull/11004


Re: RFR: 8293841: RISC-V: Implementation of Foreign Function & Memory API (Preview) [v5]

2023-01-15 Thread Fei Yang
On Tue, 10 Jan 2023 15:15:23 GMT, Feilong Jiang  wrote:

>> Add experimental Foreign Function & Memory API support for RISC-V.
>> 
>> For details of the FFM API RISC-V port please refer to [JBS 
>> issue](https://bugs.openjdk.org/browse/JDK-8293841)
>> 
>> Testing:
>> 
>> - [x] jdk_foreign with release/fastdebug build on linux-riscv64
>> - [x] jdk_foreign with release/fastdebug build on linux-x64
>> - [x] jdk_foreign with release/fastdebug build on linux-aarch64
>
> Feilong Jiang has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix callArranger

Changes requested by fyang (Reviewer).

test/jdk/java/foreign/callarranger/TestRISCV64CallArranger.java line 3:

> 1: /*
> 2:  * Copyright (c) 2020, 2022, Oracle and/or its affiliates. All rights 
> reserved.
> 3:  * Copyright (c) 2023, Institute of Software, Chinese Academy of Sciences. 
> All rights reserved.

I think we should also break this into two separate lines like you do for other 
newly-added files.

-

PR: https://git.openjdk.org/jdk/pull/11004


Re: RFR: 8294198: Implement isFinite intrinsic for RISC-V

2022-09-26 Thread Fei Yang
On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov  
wrote:

> Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic 
> turned out to be profitable on RISC-V using the same fclass instruction as 
> for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it 
> added on RISC-V in this PR.
> 
> benchmark results:
> 
> before:
> 
> Benchmark  Mode  Cnt   Score   Error  Units
> DoubleClassCheck.testIsFiniteBranchavgt   15  52.824 ± 1.744  ns/op
> DoubleClassCheck.testIsFiniteCMov  avgt   15  16.104 ± 0.358  ns/op
> DoubleClassCheck.testIsFiniteStore avgt   15  14.366 ± 2.174  ns/op
> FloatClassCheck.testIsFiniteBranch avgt   15  49.821 ± 0.330  ns/op
> FloatClassCheck.testIsFiniteCMov   avgt   15  14.702 ± 0.335  ns/op
> FloatClassCheck.testIsFiniteStore  avgt   15  14.749 ± 0.496  ns/op
> 
> after:
> 
> DoubleClassCheck.testIsFiniteBranchavgt   15  48.921 ± 0.557  ns/op
> DoubleClassCheck.testIsFiniteCMov  avgt   15  13.716 ± 0.304  ns/op
> DoubleClassCheck.testIsFiniteStore avgt   15   9.152 ± 0.158  ns/op
> FloatClassCheck.testIsFiniteBranch avgt   15  47.740 ± 2.028  ns/op
> FloatClassCheck.testIsFiniteCMov   avgt   15  13.299 ± 0.282  ns/op
> FloatClassCheck.testIsFiniteStore  avgt   15   9.185 ± 0.396  ns/op
> 
> Existing isInfinite jtreg test was altered to be able to use common code for 
> isFinite test and fine-grained requires tag filtering. Existing benchmark was 
> modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed 
> on the way.
> 
> Test passed on both release and fastdebug builds. Hotspot tier1 tests were 
> run on x86_64 and RISC-V with no issues.

I find the cause of the fluctuations for 'testIsFiniteBranch' lies in 
randomness of the input.

@Benchmark
@OperationsPerInvocation(BUFFER_SIZE)
public void testIsFiniteBranch() {
for (int i = 0; i < BUFFER_SIZE; i++) {
cmovOutputs[i] = Float.isFinite(inputs[i]) ? call() : 7;
}
}

Here the C2 JIT code for invoking 'call()' has changed with this patch. The 
register allocation is
different and hence the difference of saving and restoring of live registers 
around the method call. So the probability of invoking this method call will 
affect the JMH result, which is not the case for the other two JMH tests.

For the other two JMH tests, I see the performance gain on SiFive platform 
comes from C2 loop unrolling. Since your change benefit the other two JMH tests 
on both SiFive and C906 platforms, looks good to me.

-

Marked as reviewed by fyang (Reviewer).

PR: https://git.openjdk.org/jdk/pull/10391


Re: RFR: 8294198: Implement isFinite intrinsic for RISC-V

2022-09-23 Thread Fei Yang
On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov  
wrote:

> Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic 
> turned out to be profitable on RISC-V using the same fclass instruction as 
> for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it 
> added on RISC-V in this PR.
> 
> benchmark results:
> 
> before:
> 
> Benchmark  Mode  Cnt   Score   Error  Units
> DoubleClassCheck.testIsFiniteBranchavgt   15  52.824 ± 1.744  ns/op
> DoubleClassCheck.testIsFiniteCMov  avgt   15  16.104 ± 0.358  ns/op
> DoubleClassCheck.testIsFiniteStore avgt   15  14.366 ± 2.174  ns/op
> FloatClassCheck.testIsFiniteBranch avgt   15  49.821 ± 0.330  ns/op
> FloatClassCheck.testIsFiniteCMov   avgt   15  14.702 ± 0.335  ns/op
> FloatClassCheck.testIsFiniteStore  avgt   15  14.749 ± 0.496  ns/op
> 
> after:
> 
> DoubleClassCheck.testIsFiniteBranchavgt   15  48.921 ± 0.557  ns/op
> DoubleClassCheck.testIsFiniteCMov  avgt   15  13.716 ± 0.304  ns/op
> DoubleClassCheck.testIsFiniteStore avgt   15   9.152 ± 0.158  ns/op
> FloatClassCheck.testIsFiniteBranch avgt   15  47.740 ± 2.028  ns/op
> FloatClassCheck.testIsFiniteCMov   avgt   15  13.299 ± 0.282  ns/op
> FloatClassCheck.testIsFiniteStore  avgt   15   9.185 ± 0.396  ns/op
> 
> Existing isInfinite jtreg test was altered to be able to use common code for 
> isFinite test and fine-grained requires tag filtering. Existing benchmark was 
> modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed 
> on the way.
> 
> Test passed on both release and fastdebug builds. Hotspot tier1 tests were 
> run on x86_64 and RISC-V with no issues.

May I ask on which platform was the JMH test performed? I see some fluctuations 
for 'testIsFiniteBranch' on SiFive Unmatched board:

Before:
Benchmark  Mode  Cnt   Score   Error  Units
DoubleClassCheck.testIsFiniteBranchavgt   15  32.114 ± 3.514  ns/op
DoubleClassCheck.testIsFiniteCMov  avgt   15   8.656 ± 0.023  ns/op
DoubleClassCheck.testIsFiniteStore avgt   15   7.757 ± 1.691  ns/op
FloatClassCheck.testIsFiniteBranch avgt   15  32.446 ± 3.130  ns/op
FloatClassCheck.testIsFiniteCMov   avgt   15   7.801 ± 0.011  ns/op
FloatClassCheck.testIsFiniteStore  avgt   15   7.279 ± 1.483  ns/op

Benchmark  Mode  Cnt   Score   Error  Units
DoubleClassCheck.testIsFiniteBranchavgt   15  33.724 ± 5.157  ns/op
DoubleClassCheck.testIsFiniteCMov  avgt   15   8.642 ± 0.012  ns/op
DoubleClassCheck.testIsFiniteStore avgt   15   7.765 ± 1.711  ns/op
FloatClassCheck.testIsFiniteBranch avgt   15  31.401 ± 4.104  ns/op
FloatClassCheck.testIsFiniteCMov   avgt   15   7.799 ± 0.014  ns/op
FloatClassCheck.testIsFiniteStore  avgt   15   8.234 ± 0.023  ns/op

Benchmark  Mode  Cnt   Score   Error  Units
DoubleClassCheck.testIsFiniteBranchavgt   15  32.461 ± 3.688  ns/op
DoubleClassCheck.testIsFiniteCMov  avgt   15   8.643 ± 0.007  ns/op
DoubleClassCheck.testIsFiniteStore avgt   15   7.772 ± 1.701  ns/op
FloatClassCheck.testIsFiniteBranch avgt   15  33.258 ± 3.131  ns/op
FloatClassCheck.testIsFiniteCMov   avgt   15   7.815 ± 0.014  ns/op
FloatClassCheck.testIsFiniteStore  avgt   15   6.343 ± 1.475  ns/op

After:
Benchmark  Mode  Cnt   Score   Error  Units
DoubleClassCheck.testIsFiniteBranchavgt   15  33.915 ± 4.193  ns/op
DoubleClassCheck.testIsFiniteCMov  avgt   15   7.736 ± 0.012  ns/op
DoubleClassCheck.testIsFiniteStore avgt   15   5.931 ± 0.007  ns/op
FloatClassCheck.testIsFiniteBranch avgt   15  33.845 ± 3.449  ns/op
FloatClassCheck.testIsFiniteCMov   avgt   15   7.543 ± 0.013  ns/op
FloatClassCheck.testIsFiniteStore  avgt   15   6.035 ± 0.006  ns/op

Benchmark  Mode  Cnt   Score   Error  Units
DoubleClassCheck.testIsFiniteBranchavgt   15  33.421 ± 4.262  ns/op
DoubleClassCheck.testIsFiniteCMov  avgt   15   7.734 ± 0.012  ns/op
DoubleClassCheck.testIsFiniteStore avgt   15   5.940 ± 0.010  ns/op
FloatClassCheck.testIsFiniteBranch avgt   15  33.348 ± 2.656  ns/op
FloatClassCheck.testIsFiniteCMov   avgt   15   7.542 ± 0.013  ns/op
FloatClassCheck.testIsFiniteStore  avgt   15   6.040 ± 0.004  ns/op

Benchmark  Mode  Cnt   Score   Error  Units
DoubleClassCheck.testIsFiniteBranchavgt   15  31.669 ± 2.517  ns/op
DoubleClassCheck.testIsFiniteCMov  avgt   15   7.761 ± 0.092  ns/op
DoubleClassCheck.testIsFiniteStore avgt   15   5.940 ± 0.007  ns/op
FloatClassCheck.testIsFiniteBranch avgt   15  33.508 ± 3.680  ns/op
FloatClassCheck.testIsFiniteCMov   avgt   15   7.534 ± 0.009  ns/op
FloatClassCheck.testIsFiniteStore  avgt   15   6.031 ± 0.007  ns/op

-

PR: https://git.openjdk.org/jdk/pull/10391


Re: RFR: 8292407: Improve Weak CAS VarHandle/Unsafe tests resilience under spurious failures [v4]

2022-08-21 Thread Fei Yang
On Fri, 19 Aug 2022 18:14:16 GMT, Aleksey Shipilev  wrote:

>> We have a few reports that existing Weak* VarHandle tests are still flaky, 
>> for example on large AArch64 machines or small RISC-V machines.
>> 
>> The flakiness is intrinsic to the nature of Weak* operations under tests, 
>> that can spuriously fail. The last attempt to fix these was 
>> [JDK-8155739](https://bugs.openjdk.org/browse/JDK-8155739). We need to 
>> strengthen these a bit more.
>> 
>> The actual values depend on the successful testing on known-failing 
>> platforms. I ballparked bumping the attempts 5x and introducing the delay 
>> would help without exploding test time in worst cases.
>
> Aleksey Shipilev has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains six commits:
> 
>  - Rework timeouts
>  - Merge branch 'master' into JDK-8292407-varhandle-weak-resilient
>  - Merge branch 'master' into JDK-8292407-varhandle-weak-resilient
>  - Update copyrights
>  - Also do Unsafe tests
>  - Fix

The latest changes look good to me. I see all these tests pass stably on my 
aarch64-linux server.
This wll also improve the case for my riscv64-linux unmatched board: except for 
VarHandleTestMethodHandleAccess*.java, the other tests could also pass stably 
now.

-

Marked as reviewed by fyang (Reviewer).

PR: https://git.openjdk.org/jdk/pull/9889