Re: RFR: 8346239: Improve memory efficiency of JimageDiffGenerator
On Thu, 19 Dec 2024 18:16:47 GMT, Severin Gehwolf wrote: > @MBaesken @RealFYang Could you please test this PR since you originally ran > into JDK-8346239 (this bug) and > [JDK-8344036](https://bugs.openjdk.org/browse/JDK-8344036). Thanks! Yes, I can confirm that it works on my linux-aarch64 platform. Great! Thanks! - PR Comment: https://git.openjdk.org/jdk/pull/22835#issuecomment-2556122458
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
On Tue, 5 Nov 2024 00:18:17 GMT, Fei Yang wrote: >>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in >>> all of the lines (267, 271 and 273) should be reverted? >>> >> I think the previous lines are okay because we are constructing the fp, >> whereas in here we want to read the old fp stored in this frame. > >> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I >> suggested to change the literal 2 into `frame::sender_sp_offset` in order to >> increase the readability, but I forgot that `frame::sender_sp_offset` is 0 >> on riscv64. However I do think it's a problem that several places throughout >> the code base uses a literal 2 when it should really be >> `frame::sender_sp_offset`. This type of code is very fiddly and I think we >> should do what we can to increase the readability, so maybe we need another >> `frame::XYZ` constant that is 2 for this case. > > Yeah, I was also considering this issue when we were porting loom. I guess > maybe `frame::metadata_words` which equals 2. Since this is not the only > place, I would suggest we do a separate cleanup PR. > >> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in >> all of the lines (267, 271 and 273) should be reverted? > > I agree with @pchilano in that we are fine with these places. > Sorry, I also thought it matched the aarch64 one without checking. @RealFYang > should I change it for `hf.sp() + frame::link_offset` or just leave it as it > was? Or maybe `hf.sp() - frame::metadata_words`. But since we have several other occurrences, I would suggest we leave it as it was and go with a separate PR for the cleanup. - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828566395
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
On Mon, 4 Nov 2024 18:23:23 GMT, Patricio Chilano Mateo wrote: >> Sorry, I also thought it matched the aarch64 one without checking. >> @RealFYang should I change it for `hf.sp() + frame::link_offset` or just >> leave it as it was? > >> Also, does this mean that the changes from 2 to frame::sender_sp_offset in >> all of the lines (267, 271 and 273) should be reverted? >> > I think the previous lines are okay because we are constructing the fp, > whereas in here we want to read the old fp stored in this frame. > As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I > suggested to change the literal 2 into `frame::sender_sp_offset` in order to > increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on > riscv64. However I do think it's a problem that several places throughout the > code base uses a literal 2 when it should really be > `frame::sender_sp_offset`. This type of code is very fiddly and I think we > should do what we can to increase the readability, so maybe we need another > `frame::XYZ` constant that is 2 for this case. Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. > Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in > all of the lines (267, 271 and 273) should be reverted? I agree with @pchilano in that we are fine with these places. - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828563437
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
On Thu, 31 Oct 2024 20:02:31 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: >> >>> 271: ? frame_sp + fsize - frame::sender_sp_offset >>> 272: // we need to re-read fp because it may be an oop and we might >>> have fixed the frame. >>> 273: : *(intptr_t**)(hf.sp() - 2); >> >> Suggestion: >> >> : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); > > Changed. Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826453713
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning
On Mon, 4 Nov 2024 18:18:38 GMT, Patricio Chilano Mateo wrote: >> Here's my suggested C2 change: >> >> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad >> b/src/hotspot/cpu/aarch64/aarch64.ad >> index d9c77a2f529..1e99db191ae 100644 >> --- a/src/hotspot/cpu/aarch64/aarch64.ad >> +++ b/src/hotspot/cpu/aarch64/aarch64.ad >> @@ -3692,14 +3692,13 @@ encode %{ >>__ post_call_nop(); >> } else { >>Label retaddr; >> + // Make the anchor frame walkable >>__ adr(rscratch2, retaddr); >> + __ str(rscratch2, Address(rthread, >> JavaThread::last_Java_pc_offset())); >>__ lea(rscratch1, RuntimeAddress(entry)); >> - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >>__ blr(rscratch1); >>__ bind(retaddr); >>__ post_call_nop(); >> - __ add(sp, sp, 2 * wordSize); >> } >> if (Compile::current()->max_vector_size() > 0) { >>__ reinitialize_ptrue(); > > Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. > @RealFYang I made the equivalent change for riscv, could you verify it's okay? @pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup! `hotspot_loom jdk_loom` still test good with both release and fastdebug builds on linux-riscv64 platform. BTW: I noticed one more return miss prediction case which I think was previously missed in https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2 Do you mind adding following small addon change to fix it? Thanks. diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 84a292242c3..ac28f4b3514 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -1263,10 +1263,10 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { if (LockingMode != LM_LEGACY) { // Check preemption for Object.wait() Label not_preempted; -__ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset())); -__ beqz(t0, not_preempted); +__ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset())); +__ beqz(t1, not_preempted); __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset())); -__ jr(t0); +__ jr(t1); __ bind(native_return); __ restore_after_resume(true /* is_native */); // reload result_handler - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828797495
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12]
On Mon, 4 Nov 2024 18:18:38 GMT, Patricio Chilano Mateo wrote: >> Here's my suggested C2 change: >> >> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad >> b/src/hotspot/cpu/aarch64/aarch64.ad >> index d9c77a2f529..1e99db191ae 100644 >> --- a/src/hotspot/cpu/aarch64/aarch64.ad >> +++ b/src/hotspot/cpu/aarch64/aarch64.ad >> @@ -3692,14 +3692,13 @@ encode %{ >>__ post_call_nop(); >> } else { >>Label retaddr; >> + // Make the anchor frame walkable >>__ adr(rscratch2, retaddr); >> + __ str(rscratch2, Address(rthread, >> JavaThread::last_Java_pc_offset())); >>__ lea(rscratch1, RuntimeAddress(entry)); >> - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >>__ blr(rscratch1); >>__ bind(retaddr); >>__ post_call_nop(); >> - __ add(sp, sp, 2 * wordSize); >> } >> if (Compile::current()->max_vector_size() > 0) { >>__ reinitialize_ptrue(); > > Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. > @RealFYang I made the equivalent change for riscv, could you verify it's okay? @pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup! `hotspot_loom jdk_loom` still test good with both release and fastdebug builds on linux-riscv64 platform. BTW: I noticed one more return miss prediction case which I think was previously missed in https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2 Do you mind adding following small addon change to fix it? Thanks. diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 84a292242c3..ac28f4b3514 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -1263,10 +1263,10 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { if (LockingMode != LM_LEGACY) { // Check preemption for Object.wait() Label not_preempted; -__ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset())); -__ beqz(t0, not_preempted); +__ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset())); +__ beqz(t1, not_preempted); __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset())); -__ jr(t0); +__ jr(t1); __ bind(native_return); __ restore_after_resume(true /* is_native */); // reload result_handler - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828797495
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9]
On Tue, 5 Nov 2024 00:18:17 GMT, Fei Yang wrote: >>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in >>> all of the lines (267, 271 and 273) should be reverted? >>> >> I think the previous lines are okay because we are constructing the fp, >> whereas in here we want to read the old fp stored in this frame. > >> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I >> suggested to change the literal 2 into `frame::sender_sp_offset` in order to >> increase the readability, but I forgot that `frame::sender_sp_offset` is 0 >> on riscv64. However I do think it's a problem that several places throughout >> the code base uses a literal 2 when it should really be >> `frame::sender_sp_offset`. This type of code is very fiddly and I think we >> should do what we can to increase the readability, so maybe we need another >> `frame::XYZ` constant that is 2 for this case. > > Yeah, I was also considering this issue when we were porting loom. I guess > maybe `frame::metadata_words` which equals 2. Since this is not the only > place, I would suggest we do a separate cleanup PR. > >> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in >> all of the lines (267, 271 and 273) should be reverted? > > I agree with @pchilano in that we are fine with these places. > Sorry, I also thought it matched the aarch64 one without checking. @RealFYang > should I change it for `hf.sp() + frame::link_offset` or just leave it as it > was? Or maybe `hf.sp() - frame::metadata_words`. But since we have several other occurrences, I would suggest we leave it as it was and go with a separate PR for the cleanup. - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828566395
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9]
On Mon, 4 Nov 2024 18:23:23 GMT, Patricio Chilano Mateo wrote: >> Sorry, I also thought it matched the aarch64 one without checking. >> @RealFYang should I change it for `hf.sp() + frame::link_offset` or just >> leave it as it was? > >> Also, does this mean that the changes from 2 to frame::sender_sp_offset in >> all of the lines (267, 271 and 273) should be reverted? >> > I think the previous lines are okay because we are constructing the fp, > whereas in here we want to read the old fp stored in this frame. > As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I > suggested to change the literal 2 into `frame::sender_sp_offset` in order to > increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on > riscv64. However I do think it's a problem that several places throughout the > code base uses a literal 2 when it should really be > `frame::sender_sp_offset`. This type of code is very fiddly and I think we > should do what we can to increase the readability, so maybe we need another > `frame::XYZ` constant that is 2 for this case. Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. > Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in > all of the lines (267, 271 and 273) should be reverted? I agree with @pchilano in that we are fine with these places. - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828563437
Re: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9]
On Thu, 31 Oct 2024 20:02:31 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: >> >>> 271: ? frame_sp + fsize - frame::sender_sp_offset >>> 272: // we need to re-read fp because it may be an oop and we might >>> have fixed the frame. >>> 273: : *(intptr_t**)(hf.sp() - 2); >> >> Suggestion: >> >> : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); > > Changed. Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? - PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826453713
Re: RFR: 8337753: Target class of upcall stub may be unloaded
On Wed, 4 Sep 2024 11:58:49 GMT, Jorn Vernee wrote: > Were you able to reproduce the original issue on RISC-V? Yeah. I can reproduce similar crash on linux-riscv64 platform running this new test as well. - PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743722070
Re: RFR: 8337753: Target class of upcall stub may be unloaded
On Tue, 6 Aug 2024 17:26:55 GMT, Jorn Vernee wrote: > As discussed in the JBS issue: > > FFM upcall stubs embed a `Method*` of the target method in the stub. This > `Method*` is read from the `LambdaForm::vmentry` field associated with the > target method handle at the time when the upcall stub is generated. The MH > instance itself is stashed in a global JNI ref. So, there should be a > reachability chain to the holder class: `MH (receiver) -> LF (form) -> > MemberName (vmentry) -> ResolvedMethodName (method) -> Class (vmholder)` > > However, it appears that, due to multiple threads racing to initialize the > `vmentry` field of the `LambdaForm` of the target method handle of an upcall > stub, it is possible that the `vmentry` is updated _after_ we embed the > corresponding `Method`* into an upcall stub (or rather, the latest update is > not visible to the thread generating the upcall stub). Technically, it is > fine to keep using a 'stale' `vmentry`, but the problem is that now the > reachability chain is broken, since the upcall stub only extracts the target > `Method*`, and doesn't keep the stale `vmentry` reachable. The holder class > can then be unloaded, resulting in a crash. > > The fix I've chosen for this is to mimic what we already do in > `MethodHandles::jump_to_lambda_form`, and re-load the `vmentry` field from > the target method handle each time. Luckily, this does not really seem to > impact performance. > > > Performance numbers > x64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 69.216 ± 1.791 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 67.787 ± 0.684 ns/op > > > aarch64: > > before: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.574 ± 0.801 ns/op > > > after: > > Benchmark Mode Cnt Score Error Units > Upcalls.panama_blank avgt 30 61.218 ± 0.554 ns/op > > > > As for the added TestUpcallStress test, it takes about 800 seconds to run > this test on the dev machine I'm using, so I've set the timeout quite high. > Since it runs for so long, I've dropped it from the default `jdk_foreign` > test suite, which runs in tier2. Instead the new test will run in tier4. > > Testing: tier 1-4 src/hotspot/cpu/riscv/upcallLinker_riscv.cpp line 264: > 262: > 263: __ block_comment("{ load target "); > 264: __ movptr(j_rarg0, (intptr_t) receiver); Hi @JornVernee , Could you please apply following small add-on change for linux-riscv64? As I witnessed build warning with GCC-13. Otherwise, builds fine and the newly-added test/jdk/java/foreign/TestUpcallStress.java is passing. diff --git a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp index 5c45a679316..55160be99d0 100644 --- a/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp +++ b/src/hotspot/cpu/riscv/upcallLinker_riscv.cpp @@ -261,7 +261,7 @@ address UpcallLinker::make_upcall_stub(jobject receiver, Symbol* signature, __ block_comment("} argument shuffle"); __ block_comment("{ load target "); - __ movptr(j_rarg0, (intptr_t) receiver); + __ movptr(j_rarg0, (address) receiver); __ far_call(RuntimeAddress(StubRoutines::upcall_stub_load_target())); // loads Method* into xmethod __ block_comment("} load target "); diff --git a/test/jdk/java/foreign/TestUpcallStress.java b/test/jdk/java/foreign/TestUpcallStress.java index 3b9b1d4b207..40607746856 100644 --- a/test/jdk/java/foreign/TestUpcallStress.java +++ b/test/jdk/java/foreign/TestUpcallStress.java @@ -24,7 +24,7 @@ /* * @test * @requires jdk.foreign.linker != "FALLBACK" - * @requires os.arch == "aarch64" & os.name == "Linux" + * @requires (os.arch == "aarch64" | os.arch=="riscv64") & os.name == "Linux" * @requires os.maxMemory > 4G * @modules java.base/jdk.internal.foreign * @build NativeTestHelper CallGeneratorHelper TestUpcallBase - PR Review Comment: https://git.openjdk.org/jdk/pull/20479#discussion_r1743130094
Re: RFR: 8334397: RISC-V: verify perf of ReverseBytesS/US
On Fri, 21 Jun 2024 14:24:26 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Thanks! > > This is similar with previous JDK-8334396. > Added some tests. > > ### Test > > | Tests | Scores | Errors | Unit > -- | -- | -- | -- | -- > Intrinsic, +zbb, +rvv | Characters.reverseBytes | 1654.535 | 69.36 | ns/op > | Shorts.reverseBytes | 1795.403 | 44.015 | ns/op > Intrinsic, +zbb, -rvv | Characters.reverseBytes | 1649.752 | 74.965 | ns/op > | Shorts.reverseBytes | 1798.637 | 49.52 | ns/op > Intrinsic, -zbb, +rvv | Characters.reverseBytes | 2279.588 | 44.222 | ns/op > | Shorts.reverseBytes | 2441.674 | 63.895 | ns/op > Intrinsic, -zbb, -rvv | Characters.reverseBytes | 2288.876 | 49.099 | ns/op > | Shorts.reverseBytes | 2454.454 | 94.004 | ns/op > No intrinsic | Characters.reverseBytes | 1629.722 | 23.656 | ns/op > | Shorts.reverseBytes | 2108.81 | 43.378 | ns/op > > LGTM. Thanks. - Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19830#pullrequestreview-2134513429
Re: RFR: 8327964: Simplify BigInteger.implMultiplyToLen intrinsic [v6]
On Wed, 22 May 2024 14:47:43 GMT, Yudi Zheng wrote: >> Moving array construction within BigInteger.implMultiplyToLen intrinsic >> candidate to its caller simplifies the intrinsic implementation in JIT >> compiler. > > Yudi Zheng has updated the pull request incrementally with one additional > commit since the last revision: > > address comments. Hi, RISC-V part of change seems fine. "java/math/BigInteger" test result is clean on linux-riscv64 platform. Thanks for the ping. - Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18226#pullrequestreview-2076019194
Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v4]
On Thu, 19 Oct 2023 12:14:52 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and >> signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have >> `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the >> sign of input value returned. On risc-v we can use `fclass.x` to specify >> type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt8 34156.580 ± 76.272 ops/ms >> MathBench.copySignFloat0 thrpt8 34181.731 ± 38.182 ops/ms >> MathBench.signumDouble 0 thrpt8 31977.258 ± 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt8 31836.852 ± 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions >> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt8 31000.996 ± 943.094 ops/ms >> MathBench.copySignFloat0 thrpt8 30678.016 ± 28.087 ops/ms >> MathBench.signumDouble 0 thrpt8 25435.010 ± 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt8 25257.058 ± 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and >> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we >> have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, >> diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> -return Math.copySign(double81, doubleNegative12); >> +return Math.copySign(double81, doubleNegative12) + >> Math.copySign(double81, double2) + Math.copySign(double4Dot1, >> doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> -return Math.copySign(floatNegative99, float1); >> +return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional > commit since the last revision: > > Changed branch inside signum implementation Still good. You might want to correct the remaining typo. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1660: > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // one - gives us a floating-point 1.0 (got from matching rule) > 1660: // bool is_double - specififes single or double precision operations > will be used. Suggestion: s/specififes/specifies/ - Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1687549814 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1365427533
Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v3]
On Wed, 18 Oct 2023 17:35:58 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and >> signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have >> `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the >> sign of input value returned. On risc-v we can use `fclass.x` to specify >> type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt8 34156.580 ± 76.272 ops/ms >> MathBench.copySignFloat0 thrpt8 34181.731 ± 38.182 ops/ms >> MathBench.signumDouble 0 thrpt8 31977.258 ± 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt8 31836.852 ± 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions >> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt8 31000.996 ± 943.094 ops/ms >> MathBench.copySignFloat0 thrpt8 30678.016 ± 28.087 ops/ms >> MathBench.signumDouble 0 thrpt8 25435.010 ± 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt8 25257.058 ± 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and >> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we >> have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, >> diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> -return Math.copySign(double81, doubleNegative12); >> +return Math.copySign(double81, doubleNegative12) + >> Math.copySign(double81, double2) + Math.copySign(double4Dot1, >> doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> -return Math.copySign(floatNegative99, float1); >> +return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional > commit since the last revision: > > Remove some effects and assert Updated change looks good. Thanks. JMH data on hifive unmatched for reference: Before: Benchmark (seed) Mode Cnt Score Error Units MathBench.copySignDouble 0 thrpt8 79728.042 ? 8211.438 ops/ms MathBench.copySignFloat0 thrpt8 79516.930 ? 13163.477 ops/ms MathBench.sigNumDouble 0 thrpt8 58204.403 ? 6795.238 ops/ms MathBench.signumFloat 0 thrpt8 57882.056 ? 3635.354 ops/ms After: Benchmark (seed) Mode Cnt Score Error Units MathBench.copySignDouble 0 thrpt8 104301.832 ? 7170.917 ops/ms MathBench.copySignFloat0 thrpt8 103008.851 ? 11722.187 ops/ms MathBench.signumDouble 0 thrpt8 64465.030 ? 6849.148 ops/ms MathBench.signumFloat 0 thrpt8 63987.290 ? 4298.311 ops/ms - Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1686455232
Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2]
On Tue, 17 Oct 2023 09:52:15 GMT, Ilya Gavrilin wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682: >> >>> 1680: // use floating-point 1.0 with a sign of input >>> 1681: is_double ? fsgnj_d(dst, one, src) >>> 1682: : fsgnj_s(dst, one, src); >> >> What if the `src` argument contains zero? Math.signum(float/double) is >> supposed to return zero if the argument is zero [1]. >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602 > > According to IEEE754, we can get positive or negative zero in the `src` > register (also positive zero can be named as zero) , and these cases included > to mask for the tmp1 (L1671-1676) and `src` value returned. I see. Thanks for the answer. I can approve this once my other comments are resolved. - PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1363440282
Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics [v2]
On Mon, 16 Oct 2023 21:14:39 GMT, Ilya Gavrilin wrote: >> Hi all, please review this changes into risc-v floating point copysign and >> signum intrinsics. >> CopySign - returns first argument with the sign of second. On risc-v we have >> `fsgnj.x` instruction, which can implement this intrinsic. >> Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the >> sign of input value returned. On risc-v we can use `fclass.x` to specify >> type of input value and return appropriate value. >> >> Tests: >> Performance tests on t-head board: >> With intrinsics: >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt8 34156.580 ± 76.272 ops/ms >> MathBench.copySignFloat0 thrpt8 34181.731 ± 38.182 ops/ms >> MathBench.signumDouble 0 thrpt8 31977.258 ± 1122.327 ops/ms >> MathBench.signumFloat 0 thrpt8 31836.852 ± 56.013 ops/ms >> >> Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions >> -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.copySignDouble 0 thrpt8 31000.996 ± 943.094 ops/ms >> MathBench.copySignFloat0 thrpt8 30678.016 ± 28.087 ops/ms >> MathBench.signumDouble 0 thrpt8 25435.010 ± 2047.085 ops/ms >> MathBench.signumFloat 0 thrpt8 25257.058 ± 79.175 ops/ms >> >> Regression tests: tier1, hotspot:tier2 on risc-v board. >> >> Also, changed name of one micro test: before we had: `sigNumDouble` and >> `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we >> have similar part: `signum`. >> Performance tests has been changed a bit, to check intrinsics result better, >> diff to modify tests: >> >> diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> index 6cd1353907e..0bee25366bf 100644 >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -143,12 +143,12 @@ public double ceilDouble() { >> >> @Benchmark >> public double copySignDouble() { >> -return Math.copySign(double81, doubleNegative12); >> +return Math.copySign(double81, doubleNegative12) + >> Math.copySign(double81, double2) + Math.copySign(double4Dot1, >> doubleNegative12); >> } >> >> @Benchmark >> public float copySignFloat() { >> -return Math.copySign(floatNegative99, float1); >> +return ... > > Ilya Gavrilin has updated the pull request incrementally with one additional > commit since the last revision: > > Fix some registers usages and typos Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659: > 1657: // on input we have NaN or +/-0.0 value we should return it, > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // tmp1 - alias for t0 register, Maybe remove this line (L1659) of the comment? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663: > 1661: // bool is_double - specififes single or double precision operations > will be used. > 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, > FloatRegister one, bool is_double) { > 1663: assert_different_registers(dst, src, one); Any reason to keep the assertion? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1682: > 1680: // use floating-point 1.0 with a sign of input > 1681: is_double ? fsgnj_d(dst, one, src) > 1682: : fsgnj_s(dst, one, src); What if the `src` argument contains zero? Math.signum(float/double) is supposed to return zero if the argument is zero [1]. [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2602 src/hotspot/cpu/riscv/riscv.ad line 7537: > 7535: instruct signumD_reg(fRegD dst, fRegD src, immD zero, fRegD one) %{ > 7536: match(Set dst (SignumD src (Binary zero one))); > 7537: effect(TEMP_DEF dst, USE src, USE one); Any reason to keep this effect? src/hotspot/cpu/riscv/riscv.ad line 7548: > 7546: instruct signumF_reg(fRegF dst, fRegF src, immF zero, fRegF one) %{ > 7547: match(Set dst (SignumF src (Binary zero one))); > 7548: effect(TEMP_DEF dst, USE src, USE one); Any reason to keep this effect? - PR Review: https://git.openjdk.org/jdk/pull/16186#pullrequestreview-1681791129 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361790804 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361788992 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361782859 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361793207 PR Review Comment: https://git.openjdk.org/jdk/pull/16186#discussion_r1361793356
Re: RFR: 8317971: RISC-V: implement copySignF/D and signumF/D intrinsics
On Fri, 13 Oct 2023 15:36:56 GMT, Ilya Gavrilin wrote: > Hi all, please review this changes into risc-v floating point copysign and > signum intrinsics. > CopySign - returns first argument with the sign of second. On risc-v we have > `fsgnj.x` instruction, which can implement this intrinsic. > Signum - returns input value if it is +/- 0.0 or NaN, otherwise 1.0 with the > sign of input value returned. On risc-v we can use `fclass.x` to specify > type of input value and return appropriate value. > > Tests: > Performance tests on t-head board: > With intrinsics: > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt8 34156.580 ± 76.272 ops/ms > MathBench.copySignFloat0 thrpt8 34181.731 ± 38.182 ops/ms > MathBench.signumDouble 0 thrpt8 31977.258 ± 1122.327 ops/ms > MathBench.signumFloat 0 thrpt8 31836.852 ± 56.013 ops/ms > > Intrinsics turned off (`-XX:+UnlockDiagnosticVMOptions > -XX:-UseCopySignIntrinsic -XX:-UseSignumIntrinsic`): > > Benchmark (seed) Mode Cnt Score Error Units > MathBench.copySignDouble 0 thrpt8 31000.996 ± 943.094 ops/ms > MathBench.copySignFloat0 thrpt8 30678.016 ± 28.087 ops/ms > MathBench.signumDouble 0 thrpt8 25435.010 ± 2047.085 ops/ms > MathBench.signumFloat 0 thrpt8 25257.058 ± 79.175 ops/ms > > Regression tests: tier1, hotspot:tier2 on risc-v board. > > Also, changed name of one micro test: before we had: `sigNumDouble` and > `signumFloat` tests, they does not matches to `signum` or `sigNum`. Now we > have similar part: `signum`. > Performance tests has been changed a bit, to check intrinsics result better, > diff to modify tests: > > diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java > b/test/micro/org/openjdk/bench/java/lang/MathBench.java > index 6cd1353907e..0bee25366bf 100644 > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -143,12 +143,12 @@ public double ceilDouble() { > > @Benchmark > public double copySignDouble() { > -return Math.copySign(double81, doubleNegative12); > +return Math.copySign(double81, doubleNegative12) + > Math.copySign(double81, double2) + Math.copySign(double4Dot1, > doubleNegative12); > } > > @Benchmark > public float copySignFloat() { > -return Math.copySign(floatNegative99, float1); > +return Math.copySign(floatNegative99, float1) + > Math.copySign(eFloat, float1) + Math.copySign... Some comments after a brief look. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1659: > 1657: // on input we have NaN or +/-0.0 value we should return it, > 1658: // otherwise return +/- 1.0 using sign of input. > 1659: // tmp1 - used to store result of fclass operation, Seems to me that scratch register `t0` could be used in this function in order to save use of `tmp1`. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1661: > 1659: // tmp1 - used to store result of fclass operation, > 1660: // one - gives us a floating-point 1.0 (got from matching rule) > 1661: // bool single_precision - specififes single or double precision > operations will be used. Suggestion: s/bool single_precision - specififes/bool is_double - specifies/ src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1663: > 1661: // bool single_precision - specififes single or double precision > operations will be used. > 1662: void C2_MacroAssembler::signum_fp(FloatRegister dst, FloatRegister src, > FloatRegister one, Register tmp1, bool is_double) { > 1663: assert_different_registers(dst, src, one); This constraint could be relaxed if we use scratch register `t0` here instead of `tmp`. src/hotspot/cpu/riscv/riscv.ad line 7511: > 7509: // Copysign and signum intrinsics > 7510: > 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{ A more simpler `immD zero` will do here which is similar with the x86 counterpart. src/hotspot/cpu/riscv/riscv.ad line 7513: > 7511: instruct copySignD_reg(fRegD dst, fRegD src1, fRegD src2, immD0 zero) %{ > 7512: match(Set dst (CopySignD src1 (Binary src2 zero))); > 7513: effect(TEMP_DEF dst, USE src1, USE src2); Unnecessary effect. src/hotspot/cpu/riscv/riscv.ad line 7526: > 7524: instruct copySignF_reg(fRegF dst, fRegF src1, fRegF src2) %{ > 7525: match(Set dst (CopySignF src1 src2)); > 7526: effect(TEMP_DEF dst, USE src1, USE src2); Unnecessary effect. src/hotspot/cpu/riscv/riscv.ad line 7537: > 7535: %} > 7536: > 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one, > iRegINoSp tmp1) %{ Use `immD zero` instread, which will help avoid reserving one FP register here. src/hotspot/cpu/riscv/riscv.ad line 7539: > 7537: instruct signumD_reg(fRegD dst, fRegD src, fRegD zero, fRegD one
Re: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5]
On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg wrote: >> Per Minborg has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add example for Option::captureStateLayout > > A review of all the copyright years shall be made in this PR. @minborg : Hello, I think you should list Feilong Jiang (OpenJDK ID: fjiang) as co-author instead. It is him who added code for the riscv port. - PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1523097913
Re: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5]
On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg wrote: >> Per Minborg has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add example for Option::captureStateLayout > > A review of all the copyright years shall be made in this PR. > Hi @minborg, looks like some changes were missed on riscv port. I've added > these changes and submitted tests on linux-riscv. `jdk_foreign` still passed > with release & fatdebug build. Could you please add these extra changes for > riscv? Thanks. Here is the patch: > [foreign_riscv_port_patch.txt](https://github.com/openjdk/jdk/files/11037700/foreign_riscv_port_patch.txt) @feilongjiang : Hello, the riscv-specific changes looks good to me. Thanks for the update. - PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1499868465
Re: RFR: 8303863: RISC-V: TestArrayStructs.java fails after JDK-8303604
On Thu, 9 Mar 2023 14:39:55 GMT, Feilong Jiang wrote: > [JDK-8303604](https://bugs.openjdk.org/browse/JDK-8303604) fixes an issue > with passing structs whose size is not a power of two on SysV and AArch64 > platforms. The same issue happens on RISC-V. > > Modifications are unnecessary for `STRUCT_REGISTER_F` and > `STRUCT_REGISTER_XF`. If there are no available registers, they will fall > back to `STRUCT_REGISTER_X`. > > Testing: > > - [x] TestArrayStructs.java > - [x] jdk_foreign on Unmatched Board (release build) Looks fine to me. - Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/12950
Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v3]
On Tue, 7 Mar 2023 21:38:38 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >> methods were implemented originally. >> >> Replaced `SharedRuntime::f2hf()` and `hf2f()` C runtime functions with calls >> to runtime stubs which use the same HW instructions as C2 intrinsics. Only >> for 64-bit x64 because 32-bit x86 stub does not work: result is passed >> through FPU register and NaN values become different from C2 intrinsic. This >> runtime stub is only used to calculate constant values during C2 compilation >> and can be skipped. >> >> I added new tests based on Tobias's `TestAll.java` And copied >> `jdk/lang/Float/Binary16Conversion*.java` tests to run them with `-Xcomp` to >> make sure code is compiled by C1 or C2. I modified >> `Binary16ConversionNaN.java` to compare results from Interpreter, C1 and C2. >> >> Tested tier1-5, Xcomp, stress > > Vladimir Kozlov has updated the pull request incrementally with one > additional commit since the last revision: > > Do not allow JIT compilation of Float.float16ToFloat and > Float.floatToFloat16 Hi, Thanks for handling linux-riscv64 at the same time. Bad news is that we witnessed test failures when running following test with QEMU (no riscv hardware available with Zfhmin extension for now): test/hotspot/jtreg/compiler/intrinsics/float16/TestAllFloat16ToFloat.java Exception in thread "main" java.lang.RuntimeException: Inconsistent result for Float.floatToFloat16(NaN/7fc0): 7e00 != fc01 at TestAllFloat16ToFloat.verify(TestAllFloat16ToFloat.java:62) at TestAllFloat16ToFloat.run(TestAllFloat16ToFloat.java:72) at TestAllFloat16ToFloat.main(TestAllFloat16ToFloat.java:94) It looks like there is a problem when handling NaNs with fcvt.h.s/fmv.x.h and fmv.h.x/fcvt.s.h instructions at the bottom. It's also possible to be an issue of QEMU as well. It would take quite a while to diagnose. But I don't want this to block this PR. So I would prefer removing support of this feature for this port and adding back once this is resolved. And I have prepared a patch for that purpose. See attachement. [12869-revert-riscv.txt](https://github.com/openjdk/jdk/files/10915606/12869-revert-riscv.txt) - PR: https://git.openjdk.org/jdk/pull/12869
Re: RFR: 8293841: RISC-V: Implementation of Foreign Function & Memory API (Preview) [v7]
On Mon, 16 Jan 2023 06:38:35 GMT, Feilong Jiang wrote: >> Add experimental Foreign Function & Memory API support for RISC-V. >> >> For details of the FFM API RISC-V port please refer to [JBS >> issue](https://bugs.openjdk.org/browse/JDK-8293841) >> >> Testing: >> >> - [x] jdk_foreign with release/fastdebug build on linux-riscv64 >> - [x] jdk_foreign with release/fastdebug build on linux-x64 >> - [x] jdk_foreign with release/fastdebug build on linux-aarch64 > > Feilong Jiang has updated the pull request incrementally with one additional > commit since the last revision: > > more code style adjustment Looks good to me. This also passed tier1-4 tests on my HiFive Unmatched boards. - Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/11004
Re: RFR: 8293841: RISC-V: Implementation of Foreign Function & Memory API (Preview) [v5]
On Tue, 10 Jan 2023 15:15:23 GMT, Feilong Jiang wrote: >> Add experimental Foreign Function & Memory API support for RISC-V. >> >> For details of the FFM API RISC-V port please refer to [JBS >> issue](https://bugs.openjdk.org/browse/JDK-8293841) >> >> Testing: >> >> - [x] jdk_foreign with release/fastdebug build on linux-riscv64 >> - [x] jdk_foreign with release/fastdebug build on linux-x64 >> - [x] jdk_foreign with release/fastdebug build on linux-aarch64 > > Feilong Jiang has updated the pull request incrementally with one additional > commit since the last revision: > > fix callArranger Changes requested by fyang (Reviewer). test/jdk/java/foreign/callarranger/TestRISCV64CallArranger.java line 3: > 1: /* > 2: * Copyright (c) 2020, 2022, Oracle and/or its affiliates. All rights > reserved. > 3: * Copyright (c) 2023, Institute of Software, Chinese Academy of Sciences. > All rights reserved. I think we should also break this into two separate lines like you do for other newly-added files. - PR: https://git.openjdk.org/jdk/pull/11004
Re: RFR: 8294198: Implement isFinite intrinsic for RISC-V
On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov wrote: > Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic > turned out to be profitable on RISC-V using the same fclass instruction as > for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it > added on RISC-V in this PR. > > benchmark results: > > before: > > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsFiniteBranchavgt 15 52.824 ± 1.744 ns/op > DoubleClassCheck.testIsFiniteCMov avgt 15 16.104 ± 0.358 ns/op > DoubleClassCheck.testIsFiniteStore avgt 15 14.366 ± 2.174 ns/op > FloatClassCheck.testIsFiniteBranch avgt 15 49.821 ± 0.330 ns/op > FloatClassCheck.testIsFiniteCMov avgt 15 14.702 ± 0.335 ns/op > FloatClassCheck.testIsFiniteStore avgt 15 14.749 ± 0.496 ns/op > > after: > > DoubleClassCheck.testIsFiniteBranchavgt 15 48.921 ± 0.557 ns/op > DoubleClassCheck.testIsFiniteCMov avgt 15 13.716 ± 0.304 ns/op > DoubleClassCheck.testIsFiniteStore avgt 15 9.152 ± 0.158 ns/op > FloatClassCheck.testIsFiniteBranch avgt 15 47.740 ± 2.028 ns/op > FloatClassCheck.testIsFiniteCMov avgt 15 13.299 ± 0.282 ns/op > FloatClassCheck.testIsFiniteStore avgt 15 9.185 ± 0.396 ns/op > > Existing isInfinite jtreg test was altered to be able to use common code for > isFinite test and fine-grained requires tag filtering. Existing benchmark was > modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed > on the way. > > Test passed on both release and fastdebug builds. Hotspot tier1 tests were > run on x86_64 and RISC-V with no issues. I find the cause of the fluctuations for 'testIsFiniteBranch' lies in randomness of the input. @Benchmark @OperationsPerInvocation(BUFFER_SIZE) public void testIsFiniteBranch() { for (int i = 0; i < BUFFER_SIZE; i++) { cmovOutputs[i] = Float.isFinite(inputs[i]) ? call() : 7; } } Here the C2 JIT code for invoking 'call()' has changed with this patch. The register allocation is different and hence the difference of saving and restoring of live registers around the method call. So the probability of invoking this method call will affect the JMH result, which is not the case for the other two JMH tests. For the other two JMH tests, I see the performance gain on SiFive platform comes from C2 loop unrolling. Since your change benefit the other two JMH tests on both SiFive and C906 platforms, looks good to me. - Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/10391
Re: RFR: 8294198: Implement isFinite intrinsic for RISC-V
On Thu, 22 Sep 2022 12:56:49 GMT, Aleksei Voitylov wrote: > Unlike on x86 (see 8285868 and the discussion in review), isFinite intrinsic > turned out to be profitable on RISC-V using the same fclass instruction as > for 8293695 (isInfinite instrinsic). Therefore, I'm proposing to have it > added on RISC-V in this PR. > > benchmark results: > > before: > > Benchmark Mode Cnt Score Error Units > DoubleClassCheck.testIsFiniteBranchavgt 15 52.824 ± 1.744 ns/op > DoubleClassCheck.testIsFiniteCMov avgt 15 16.104 ± 0.358 ns/op > DoubleClassCheck.testIsFiniteStore avgt 15 14.366 ± 2.174 ns/op > FloatClassCheck.testIsFiniteBranch avgt 15 49.821 ± 0.330 ns/op > FloatClassCheck.testIsFiniteCMov avgt 15 14.702 ± 0.335 ns/op > FloatClassCheck.testIsFiniteStore avgt 15 14.749 ± 0.496 ns/op > > after: > > DoubleClassCheck.testIsFiniteBranchavgt 15 48.921 ± 0.557 ns/op > DoubleClassCheck.testIsFiniteCMov avgt 15 13.716 ± 0.304 ns/op > DoubleClassCheck.testIsFiniteStore avgt 15 9.152 ± 0.158 ns/op > FloatClassCheck.testIsFiniteBranch avgt 15 47.740 ± 2.028 ns/op > FloatClassCheck.testIsFiniteCMov avgt 15 13.299 ± 0.282 ns/op > FloatClassCheck.testIsFiniteStore avgt 15 9.185 ± 0.396 ns/op > > Existing isInfinite jtreg test was altered to be able to use common code for > isFinite test and fine-grained requires tag filtering. Existing benchmark was > modified to include isFinite case. A typo ("Atleast" -> "At least") was fixed > on the way. > > Test passed on both release and fastdebug builds. Hotspot tier1 tests were > run on x86_64 and RISC-V with no issues. May I ask on which platform was the JMH test performed? I see some fluctuations for 'testIsFiniteBranch' on SiFive Unmatched board: Before: Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsFiniteBranchavgt 15 32.114 ± 3.514 ns/op DoubleClassCheck.testIsFiniteCMov avgt 15 8.656 ± 0.023 ns/op DoubleClassCheck.testIsFiniteStore avgt 15 7.757 ± 1.691 ns/op FloatClassCheck.testIsFiniteBranch avgt 15 32.446 ± 3.130 ns/op FloatClassCheck.testIsFiniteCMov avgt 15 7.801 ± 0.011 ns/op FloatClassCheck.testIsFiniteStore avgt 15 7.279 ± 1.483 ns/op Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsFiniteBranchavgt 15 33.724 ± 5.157 ns/op DoubleClassCheck.testIsFiniteCMov avgt 15 8.642 ± 0.012 ns/op DoubleClassCheck.testIsFiniteStore avgt 15 7.765 ± 1.711 ns/op FloatClassCheck.testIsFiniteBranch avgt 15 31.401 ± 4.104 ns/op FloatClassCheck.testIsFiniteCMov avgt 15 7.799 ± 0.014 ns/op FloatClassCheck.testIsFiniteStore avgt 15 8.234 ± 0.023 ns/op Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsFiniteBranchavgt 15 32.461 ± 3.688 ns/op DoubleClassCheck.testIsFiniteCMov avgt 15 8.643 ± 0.007 ns/op DoubleClassCheck.testIsFiniteStore avgt 15 7.772 ± 1.701 ns/op FloatClassCheck.testIsFiniteBranch avgt 15 33.258 ± 3.131 ns/op FloatClassCheck.testIsFiniteCMov avgt 15 7.815 ± 0.014 ns/op FloatClassCheck.testIsFiniteStore avgt 15 6.343 ± 1.475 ns/op After: Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsFiniteBranchavgt 15 33.915 ± 4.193 ns/op DoubleClassCheck.testIsFiniteCMov avgt 15 7.736 ± 0.012 ns/op DoubleClassCheck.testIsFiniteStore avgt 15 5.931 ± 0.007 ns/op FloatClassCheck.testIsFiniteBranch avgt 15 33.845 ± 3.449 ns/op FloatClassCheck.testIsFiniteCMov avgt 15 7.543 ± 0.013 ns/op FloatClassCheck.testIsFiniteStore avgt 15 6.035 ± 0.006 ns/op Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsFiniteBranchavgt 15 33.421 ± 4.262 ns/op DoubleClassCheck.testIsFiniteCMov avgt 15 7.734 ± 0.012 ns/op DoubleClassCheck.testIsFiniteStore avgt 15 5.940 ± 0.010 ns/op FloatClassCheck.testIsFiniteBranch avgt 15 33.348 ± 2.656 ns/op FloatClassCheck.testIsFiniteCMov avgt 15 7.542 ± 0.013 ns/op FloatClassCheck.testIsFiniteStore avgt 15 6.040 ± 0.004 ns/op Benchmark Mode Cnt Score Error Units DoubleClassCheck.testIsFiniteBranchavgt 15 31.669 ± 2.517 ns/op DoubleClassCheck.testIsFiniteCMov avgt 15 7.761 ± 0.092 ns/op DoubleClassCheck.testIsFiniteStore avgt 15 5.940 ± 0.007 ns/op FloatClassCheck.testIsFiniteBranch avgt 15 33.508 ± 3.680 ns/op FloatClassCheck.testIsFiniteCMov avgt 15 7.534 ± 0.009 ns/op FloatClassCheck.testIsFiniteStore avgt 15 6.031 ± 0.007 ns/op - PR: https://git.openjdk.org/jdk/pull/10391
Re: RFR: 8292407: Improve Weak CAS VarHandle/Unsafe tests resilience under spurious failures [v4]
On Fri, 19 Aug 2022 18:14:16 GMT, Aleksey Shipilev wrote: >> We have a few reports that existing Weak* VarHandle tests are still flaky, >> for example on large AArch64 machines or small RISC-V machines. >> >> The flakiness is intrinsic to the nature of Weak* operations under tests, >> that can spuriously fail. The last attempt to fix these was >> [JDK-8155739](https://bugs.openjdk.org/browse/JDK-8155739). We need to >> strengthen these a bit more. >> >> The actual values depend on the successful testing on known-failing >> platforms. I ballparked bumping the attempts 5x and introducing the delay >> would help without exploding test time in worst cases. > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains six commits: > > - Rework timeouts > - Merge branch 'master' into JDK-8292407-varhandle-weak-resilient > - Merge branch 'master' into JDK-8292407-varhandle-weak-resilient > - Update copyrights > - Also do Unsafe tests > - Fix The latest changes look good to me. I see all these tests pass stably on my aarch64-linux server. This wll also improve the case for my riscv64-linux unmatched board: except for VarHandleTestMethodHandleAccess*.java, the other tests could also pass stably now. - Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/9889