Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-09-05 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-08-31 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-08-29 Thread Sandhya Viswanathan
On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load >> library > > The changes to

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v3]

2023-10-11 Thread Sandhya Viswanathan
On Tue, 10 Oct 2023 22:29:55 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix whitespace in build script > > Also @forceinline in these changes only works for case when new

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 09:25:15 GMT, Andrew Haley wrote: > > Forgive me, I might be missing something very obvious, but is there any > > particular reason to entirely disable the SIMD accelerated sort on Zen 4 > > rather than having an alternate code path for Zen 4 where it has the > >

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v4]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 17:28:12 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the follow-up comments to the SIMD >> accelerated sort PR (#14227) which implemented AVX512 intrinsics for >> Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-03 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-03 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-02 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-06 Thread Sandhya Viswanathan
On Fri, 3 Nov 2023 22:44:39 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Restricting masked sub-word gather to AVX512 target to align with integral >> g

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-06 Thread Sandhya Viswanathan
On Sun, 5 Nov 2023 12:58:57 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606: >> >>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt, >>> XMMRegister dst, Register base, Register idx_base, >>> 1605:

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-09-25 Thread Sandhya Viswanathan
On Wed, 30 Aug 2023 02:01:38 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load >> library > > Good. Thank you.

Re: RFR: 8314544: Matrix multiple benchmark using Vector API

2023-10-03 Thread Sandhya Viswanathan
On Mon, 21 Aug 2023 03:50:32 GMT, Martin Stypinski wrote: >> Added a bunch of different implementations for Vector API Matrix >> Multiplications: >> >> - Baseline >> - Blocked (Cache Local) >> - FMA >> - Vector API Simple Implementation >> - Vector API Blocked Implementation >> >> Commit was

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Sandhya Viswanathan
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays.

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 23:14:26 GMT, Vladimir Ivanov wrote: > Proposed patch has one disadvantage: there's no way to override ergonomics > decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding > the JVM. > > For many other intrinsics there are flags which enable finer

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 23:25:30 GMT, Vladimir Ivanov wrote: >> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 157: >> >>> 155: @ForceInline >>> 156: private static void sort(Class elemType, A array, long >>> offset, int low, int high, SortOperation so) { >>> 157:

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v4]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 22:25:14 GMT, Erik Joelsson wrote: >> Hi Erik (@erikj79), >> BUILD_LIBFALLBACKLINKER is from different PR (#13079). If I understand >> correctly, for LIB_SIMD_SORT, are you suggesting that we don't pad the lines >> with spaces to align features into columns and instead

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v3]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 18:31:44 GMT, Sandhya Viswanathan wrote: >> Also @forceinline in these changes only works for case when new intrinsics >> are not used. >> I would suggest to adapt/update JMH benchmark to cover all cases and see >> effect @forceinline without intri

Re: RFR: 8314544: Matrix multiply benchmark using Vector API [v2]

2023-10-06 Thread Sandhya Viswanathan
On Fri, 6 Oct 2023 08:32:28 GMT, Martin Stypinski wrote: >> Martin Stypinski has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - changed for consistency >> - improved some RandomGenerator & unuseed Imports > > fixed typo. @Styp Thanks,

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-13 Thread Sandhya Viswanathan
On Fri, 13 Oct 2023 10:31:14 GMT, himichael wrote: >> @himichael Please refer to [this >> question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) >> for how to correctly benchmark Java code. > >> @himichael Please refer to [this >>

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29]

2023-08-25 Thread Sandhya Viswanathan
On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Remove unnecessary import in Arrays.java > > After I fixed it Tier1 passed and I submitted other tiers.

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-08-28 Thread Sandhya Viswanathan
On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays.

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

2023-11-09 Thread Sandhya Viswanathan
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler

RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-09 Thread Sandhya Viswanathan
Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked as flagless through @requires vm.flagless per [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). - Commit messages: - Mark LoadJsvmlTest.java test as flagless Changes:

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

2023-11-09 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 18:56:19 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v5]

2022-08-24 Thread Sandhya Viswanathan
On Wed, 24 Aug 2022 23:48:36 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Updated

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-01 Thread Sandhya Viswanathan
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Addressed

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v6]

2022-09-01 Thread Sandhya Viswanathan
On Thu, 1 Sep 2022 18:31:07 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Addressed

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v5]

2022-09-01 Thread Sandhya Viswanathan
On Thu, 1 Sep 2022 18:26:52 GMT, Smita Kamath wrote: >> src/hotspot/cpu/x86/x86_64.ad line 11330: >> >>> 11328: ins_pipe( pipe_slow ); >>> 11329: %} >>> 11330: >> >> For HF2F, good to also add optimized rule with LoadS to benefit from >> vcvtph2ps memory src form of instruction. >>

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v7]

2022-09-01 Thread Sandhya Viswanathan
On Thu, 1 Sep 2022 23:22:46 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Added missing

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-20 Thread Sandhya Viswanathan
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote: >> 8289552: Make intrinsic conversions between bit representations of half >> precision values and floats > > Smita Kamath has updated the pull request incrementally with one additional > commit since the last revision: > > Addressed

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-29 Thread Sandhya Viswanathan
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote: >> @vnkozlov I have addressed all review comments. Could you please run the >> patch through your testing? Thanks a lot for all the help. > > @smita-kamath I have builds failures. Please, build and test yourself to > verify changes. > >

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats [v8]

2022-09-29 Thread Sandhya Viswanathan
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote: >> @vnkozlov I have addressed all review comments. Could you please run the >> patch through your testing? Thanks a lot for all the help. > > @smita-kamath I have builds failures. Please, build and test yourself to > verify changes. > >

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats

2022-08-08 Thread Sandhya Viswanathan
On Fri, 5 Aug 2022 23:58:49 GMT, Joe Darcy wrote: >> @jddarcy Thanks for your comment. I am not sure if there is a way of using >> Java library implementation here. > > I was under the impression that if a platform didn't have special support for > the functionality in question it could not

Re: RFR: 8289552: Make intrinsic conversions between bit representations of half precision values and floats

2022-08-08 Thread Sandhya Viswanathan
On Fri, 5 Aug 2022 16:36:23 GMT, Smita Kamath wrote: > 8289552: Make intrinsic conversions between bit representations of half > precision values and floats src/hotspot/cpu/x86/assembler_x86.cpp line 1927: > 1925: assert(VM_Version::supports_evex(), ""); > 1926: InstructionAttr

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2023-01-06 Thread Sandhya Viswanathan
On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote: >> @cl4es Thanks for passing the constant node through, the code looks much >> cleaner now. The attached patch should handle the signed bytes/shorts as >> well. Please take a look. >>

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18]

2023-01-09 Thread Sandhya Viswanathan
On Mon, 9 Jan 2023 23:13:29 GMT, Claes Redestad wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Explicitly lea external address > > Explicitly loading the address to a register seems to do the trick, avoiding >

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-16 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-16 Thread Sandhya Viswanathan
On Sun, 13 Nov 2022 20:57:44 GMT, Claes Redestad wrote: >> src/hotspot/cpu/x86/x86_64.ad line 12073: >> >>> 12071: legRegD tmp_vec13, rRegI tmp1, rRegI tmp2, >>> rRegI tmp3, rFlagsReg cr) >>> 12072: %{ >>> 12073: predicate(UseAVX >= 2 &&

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-16 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v14]

2022-12-20 Thread Sandhya Viswanathan
On Tue, 20 Dec 2022 21:11:40 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-20 Thread Sandhya Viswanathan
On Tue, 20 Dec 2022 19:52:34 GMT, Claes Redestad wrote: >> src/java.base/share/classes/java/lang/StringUTF16.java line 418: >> >>> 416: return 0; >>> 417: } else { >>> 418: return ArraysSupport.vectorizedHashCode(value, >>> ArraysSupport.UTF16); >> >> Special

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-12-20 Thread Sandhya Viswanathan
On Tue, 20 Dec 2022 21:11:18 GMT, Claes Redestad wrote: >>> How far off is this ...? >> >> Back then it looked way too constrained (tight constraints on code shapes). >> But I considered it as a generally applicable optimization. >> >>> ... do you think it'll be able to match the efficiency

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v16]

2022-12-21 Thread Sandhya Viswanathan
On Wed, 21 Dec 2022 17:29:23 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-21 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: JDK-8301092 - Add benchmark for CRC32

2023-01-25 Thread Sandhya Viswanathan
On Wed, 25 Jan 2023 15:03:05 GMT, Scott Gibbons wrote: > Adding a performance benchmark test for CRC32. This does exactly the same > test as for CRC32C. test/micro/org/openjdk/bench/java/util/TestCRC32.java line 2: > 1: /* > 2: * Copyright (c) 2021, 2022, 2023, Oracle and/or its affiliates.

Re: RFR: JDK-8301092 - Add benchmark for CRC32 [v3]

2023-01-25 Thread Sandhya Viswanathan
On Wed, 25 Jan 2023 23:07:49 GMT, Scott Gibbons wrote: >> Adding a performance benchmark test for CRC32. This does exactly the same >> test as for CRC32C. > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Fix copyright

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Mon, 6 Mar 2023 23:54:44 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 00:52:37 GMT, Vladimir Kozlov wrote: > Note, I removed `ConvF2HFNode::Identity()` optimization because tests show > that it produces different NaN results due to skipped conversion. Yes, removing the Identity optimization is correct. It doesn't hold for NaN inputs.

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote: > Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in > Interpreter and C1 compiler to produce the same results as C2 intrinsics on > x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 01:59:25 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931: >> >>> 3929: // For results consistency both intrinsics should be enabled. >>> 3930: if >>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) && >>>

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote: > Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in > Interpreter and C1 compiler to produce the same results as C2 intrinsics on > x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v2]

2023-03-07 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 02:53:48 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-22 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 04:03:02 GMT, David Holmes wrote: >> Change the java/lang/float.java and the corresponding shared runtime >> constant expression evaluation to generate QNaN. >> The HW instructions generate QNaNs and not SNaNs for floating point >> instructions. This happens across double,

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-22 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 21:21:42 GMT, Vladimir Kozlov wrote: >>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have >>> three mechanisms for implementing this functionality: >>> >>> 1. The interpreted Java code >>> >>> 2. The compiled non-intrinisc sharedRuntime

Re: RFR: 8303401: Add a Vector API equalsIgnoreCase micro benchmark

2023-02-28 Thread Sandhya Viswanathan
On Tue, 28 Feb 2023 15:59:26 GMT, Eirik Bjorsnos wrote: > This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set of > benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark serves > as an example of how vectorization can be useful also in the area of text >

Re: RFR: 8303401: Add a Vector API equalsIgnoreCase micro benchmark [v3]

2023-02-28 Thread Sandhya Viswanathan
On Tue, 28 Feb 2023 23:08:29 GMT, Eirik Bjorsnos wrote: >> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set >> of benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark >> serves as an example of how vectorization can be useful also in the area of >>

RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-21 Thread Sandhya Viswanathan
Change the java/lang/float.java and the corresponding shared runtime constant expression evaluation to generate QNaN. The HW instructions generate QNaNs and not SNaNs for floating point instructions. This happens across double, float, and float16 data types. The most significant bit of mantissa

Withdrawn: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-23 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan wrote: > Change the java/lang/float.java and the corresponding shared runtime constant > expression evaluation to generate QNaN. > The HW instructions generate QNaNs and not SNaNs for floating point > instructions. This ha

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-02-23 Thread Sandhya Viswanathan
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan wrote: > Change the java/lang/float.java and the corresponding shared runtime constant > expression evaluation to generate QNaN. > The HW instructions generate QNaNs and not SNaNs for floating point > instructions. This ha

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v11]

2023-02-06 Thread Sandhya Viswanathan
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v11]

2023-02-06 Thread Sandhya Viswanathan
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v11]

2023-02-07 Thread Sandhya Viswanathan
On Tue, 7 Feb 2023 02:49:44 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add algorithm comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp l

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-13 Thread Sandhya Viswanathan
On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 22:41:47 GMT, Claes Redestad wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Last of review comments > > I've started tier1-5 testing internally. Will let you know if we find any > issues.

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 15:03:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2658: >> >>> 2656: // Check for buffer too small (for algorithm) >>> 2657: __ subl(length, 0x2c); >>> 2658: __ jcc(Assembler::lessEqual, L_tailProc); >> >> This could be

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v15]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 15:19:34 GMT, Claes Redestad wrote: >> Why? There is no performance difference and the intent is clear. Is this >> just a "style" thing? > > I think with `lessEqual` we'll jump to `L_tailProc` for the final 32-byte > chunk in inputs that are divisible by 32 (starting from

Re: RFR: JDK-8300808: Accelerate Base64 on x86 for AVX2 [v17]

2023-02-14 Thread Sandhya Viswanathan
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons wrote: >> Added code for Base64 acceleration (encode and decode) which will accelerate >> ~4x for AVX2 platforms. >> >> Encode performance: >> **Old:** >> >> Benchmark (maxNumBytes) Mode Cnt Score Error >> Units

Re: RFR: 8311178: JMH tests don't scale well when sharing output buffers

2023-07-10 Thread Sandhya Viswanathan
On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and > leads to poor scaling when run on multiple threads. The patch sets the scope > from benchmark level to thread level to fix the issue: > -

Re: RFR: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation

2023-06-26 Thread Sandhya Viswanathan
On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote: > Backing out shuffle related overhaul done with > [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw > significant performance degradation in VectorAPI JMH micros and some of our > internal benchmarks. Following two

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8]

2024-01-22 Thread Sandhya Viswanathan
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-07 Thread Sandhya Viswanathan
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-18 Thread Sandhya Viswanathan
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch fixes the building failure introduced by >> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC >> version (linux & GCC 7.5.0 locally). >> >> Thanks for the review. >> >> Best Regards, >> --

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-19 Thread Sandhya Viswanathan
On Tue, 19 Dec 2023 19:08:08 GMT, Kim Barrett wrote: >>> Have you tested with gcc 9? Or is this just supposition based on gcc9 >>> having removed the experimental >> status for C++17? >> >> I have not tested GCC 8 and 9. @sviswa7 seems to test them. >> >>> I have verified that with the above

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-19 Thread Sandhya Viswanathan
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch fixes the building failure introduced by >> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC >> version (linux & GCC 7.5.0 locally). >> >> Thanks for the review. >> >> Best Regards, >> --

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-19 Thread Sandhya Viswanathan
On Tue, 19 Dec 2023 02:22:05 GMT, Guoxiong Li wrote: >> Guoxiong Li has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains four additional >> commits

Re: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v4]

2023-12-20 Thread Sandhya Viswanathan
On Tue, 19 Dec 2023 18:42:19 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v2]

2023-12-20 Thread Sandhya Viswanathan
On Wed, 29 Nov 2023 15:01:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v2]

2023-11-29 Thread Sandhya Viswanathan
On Tue, 28 Nov 2023 20:52:35 GMT, Srinivas Vamsi Parasa wrote: >> Thanks Sandhya, will fix this issue. > > Thanks Sandhya for suggesting the change to use supports_simd_sort(BasicType > bt). Please see the updated code upstreamed. @vamsi-parasa Thanks, your changes look good to me.

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-04 Thread Sandhya Viswanathan
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v2]

2023-11-27 Thread Sandhya Viswanathan
On Sat, 18 Nov 2023 01:21:09 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v2]

2023-11-27 Thread Sandhya Viswanathan
On Tue, 21 Nov 2023 15:14:28 GMT, Dalibor Topic wrote: >> src/java.base/linux/native/libsimdsort/avx2-32bit-qsort.hpp line 3: >> >>> 1: /* >>> 2: * Copyright (c) 2021, 2023, Intel Corporation. All rights reserved. >>> 3: * Copyright (c) 2021 Serge Sans Paille. All rights reserved. >> >> Is

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Sandhya Viswanathan
On Wed, 6 Dec 2023 17:48:04 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Sandhya Viswanathan
On Wed, 6 Dec 2023 18:26:34 GMT, Vladimir Kozlov wrote: >> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this >> PR today before the fork. > >> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this >> PR today before the fork. > > Too late. Changes

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v11]

2024-01-31 Thread Sandhya Viswanathan
On Sun, 21 Jan 2024 06:55:43 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v11]

2024-01-31 Thread Sandhya Viswanathan
On Wed, 31 Jan 2024 21:31:21 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v12]

2024-02-02 Thread Sandhya Viswanathan
On Thu, 1 Feb 2024 16:24:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-15 Thread Sandhya Viswanathan
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan wrote: >> match_rule_supported_vector called in the beginning will enforce these >> checks. > > This method is match_rule_support_vector and it is not enforcing this check > now. It was doing so before through fall thr

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v7]

2023-11-15 Thread Sandhya Viswanathan
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy

2023-11-14 Thread Sandhya Viswanathan
On Tue, 14 Nov 2023 08:09:28 GMT, Jatin Bhateja wrote: >> Below is baseline data collected using a modified version of the >> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug >> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake >> i7-1185G7,

Re: RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan wrote: > Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked > as flagless through @requires vm.flagless per > [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). @lmesnik Could you plea

Re: RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Wed, 15 Nov 2023 01:07:23 GMT, Leonid Mesnik wrote: >> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus >> marked as flagless through @requires vm.flagless per >> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). > > Marked as reviewed by lmesnik (Reviewer).

Integrated: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan wrote: > Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked > as flagless through @requires vm.flagless per > [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). This pull request has now been i

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy [v5]

2023-11-20 Thread Sandhya Viswanathan
On Mon, 20 Nov 2023 22:50:19 GMT, Steve Dohrmann wrote: >> Update: the XorTest::xor results shown in this message used test code from >> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit >> a788f066af17. The XorTest has since been updated and XorTest::copy is no >>

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy [v6]

2023-11-21 Thread Sandhya Viswanathan
On Tue, 21 Nov 2023 21:03:20 GMT, Steve Dohrmann wrote: >> Update: the XorTest::xor results shown in this message used test code from >> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit >> a788f066af17. The XorTest has since been updated and XorTest::copy is no >>

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v13]

2024-04-11 Thread Sandhya Viswanathan
On Fri, 12 Apr 2024 00:10:22 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Addressing yet more review comments > > src/hotspot/cpu/x86/stubGenerator_

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v12]

2024-04-11 Thread Sandhya Viswanathan
On Fri, 12 Apr 2024 00:00:38 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2751: >> >>> 2749: UnsafeSetMemoryMark usmm(this, true, true); >>> 2750: >>> 2751: __ generate_fill(T_BYTE, false, c_rarg0, c_rarg1, r11, rax, >>> xmm0); >>

  1   2   >