Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v47]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: >> >>> 1331: >>> 1332: __ cmpq(nMinusK, 32); >>> 1333: __ jae_b(L_greaterThan32); >> >> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 >> __

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: >> >>> 276: __ bind(L_nextCheck); >>> 277: __ testq(haystack_len_p, haystack_len_p); >>> 278: __ je(L_zeroCheckFailed); >> >> This check could be removed as the next

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: >> >>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, >>> mask,

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v35]

2024-05-24 Thread Sandhya Viswanathan
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v41]

2024-05-24 Thread Sandhya Viswanathan
On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v20]

2024-05-24 Thread Sandhya Viswanathan
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v27]

2024-05-24 Thread Sandhya Viswanathan
On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v25]

2024-05-24 Thread Sandhya Viswanathan
On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v40]

2024-05-24 Thread Sandhya Viswanathan
On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v35]

2024-05-24 Thread Sandhya Viswanathan
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v20]

2024-05-21 Thread Sandhya Viswanathan
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-17 Thread Sandhya Viswanathan
On Thu, 16 May 2024 17:08:21 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238: >> >>> 236: const Register needle = rdx; >>> 237: const Register needle_len = rcx; >>> 238: >> >> This is the calling convention on Linux. How is windows

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-17 Thread Sandhya Viswanathan
On Thu, 16 May 2024 20:22:40 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510: >> >>> 1508: compare_big_haystack_to_needle(sizeKnown, size, >>> NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength, >>> 1509:

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Sandhya Viswanathan
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]

2024-05-16 Thread Sandhya Viswanathan
On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-15 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-14 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-13 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-13 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-07 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-07 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-07 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-06 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v19]

2024-05-06 Thread Sandhya Viswanathan
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v21]

2024-04-18 Thread Sandhya Viswanathan
On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v21]

2024-04-15 Thread Sandhya Viswanathan
On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v20]

2024-04-15 Thread Sandhya Viswanathan
On Mon, 15 Apr 2024 23:01:21 GMT, Jorn Vernee wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2686: >> >>> 2684: __ movq(rdx, rsi); >>> 2685: restore_arg_regs(); >>> 2686: #endif >> >> This is stubGenerator_x86_64.cpp 64bit specific, so WIN32 portion could be

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v14]

2024-04-15 Thread Sandhya Viswanathan
On Fri, 12 Apr 2024 16:47:58 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v20]

2024-04-15 Thread Sandhya Viswanathan
On Mon, 15 Apr 2024 18:43:24 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v13]

2024-04-11 Thread Sandhya Viswanathan
On Fri, 12 Apr 2024 00:10:22 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Addressing yet more review comments > > src/hotspot/cpu/x86/stubGenerator_

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v13]

2024-04-11 Thread Sandhya Viswanathan
On Fri, 12 Apr 2024 00:07:56 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v12]

2024-04-11 Thread Sandhya Viswanathan
On Fri, 12 Apr 2024 00:00:38 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2751: >> >>> 2749: UnsafeSetMemoryMark usmm(this, true, true); >>> 2750: >>> 2751: __ generate_fill(T_BYTE, false, c_rarg0, c_rarg1, r11, rax, >>> xmm0); >>

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v12]

2024-04-11 Thread Sandhya Viswanathan
On Thu, 11 Apr 2024 21:47:01 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v11]

2024-04-11 Thread Sandhya Viswanathan
On Thu, 11 Apr 2024 20:58:00 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 735: >> >>> 733: >>> 734: if (MaxVectorSize == 64) { >>> 735: UnsafeCopyMemoryMark ucmm(this, !is_oop && !aligned, false, >>> ucme_exit_pc); >> >> This is not related

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v11]

2024-04-11 Thread Sandhya Viswanathan
On Thu, 11 Apr 2024 18:42:56 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v7]

2024-04-10 Thread Sandhya Viswanathan
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v7]

2024-04-10 Thread Sandhya Viswanathan
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8329331: Intrinsify Unsafe::setMemory [v7]

2024-04-10 Thread Sandhya Viswanathan
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons wrote: >> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See >> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around >> this change. >> >> Overall, making this an intrinsic improves overall

Re: RFR: 8328074: Add jcheck whitespace checking for assembly files [v4]

2024-03-16 Thread Sandhya Viswanathan
On Wed, 13 Mar 2024 11:59:36 GMT, Magnus Ihse Bursie wrote: >> As part of the ongoing effort to enable jcheck whitespace checking to all >> text files, it is now time to address assembly (.S) files. The hotspot >> assembly files were fixed as part of the hotspot mapfile removal, so only a >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v13]

2024-02-07 Thread Sandhya Viswanathan
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v12]

2024-02-02 Thread Sandhya Viswanathan
On Thu, 1 Feb 2024 16:24:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v11]

2024-01-31 Thread Sandhya Viswanathan
On Wed, 31 Jan 2024 21:31:21 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v11]

2024-01-31 Thread Sandhya Viswanathan
On Sun, 21 Jan 2024 06:55:43 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially >>

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8]

2024-01-22 Thread Sandhya Viswanathan
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7]

2024-01-19 Thread Sandhya Viswanathan
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 >> only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 >> instruction set. >> These are very frequently used APIs

Re: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v2]

2023-12-20 Thread Sandhya Viswanathan
On Wed, 29 Nov 2023 15:01:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v4]

2023-12-20 Thread Sandhya Viswanathan
On Tue, 19 Dec 2023 18:42:19 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-19 Thread Sandhya Viswanathan
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch fixes the building failure introduced by >> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC >> version (linux & GCC 7.5.0 locally). >> >> Thanks for the review. >> >> Best Regards, >> --

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-19 Thread Sandhya Viswanathan
On Tue, 19 Dec 2023 19:08:08 GMT, Kim Barrett wrote: >>> Have you tested with gcc 9? Or is this just supposition based on gcc9 >>> having removed the experimental >> status for C++17? >> >> I have not tested GCC 8 and 9. @sviswa7 seems to test them. >> >>> I have verified that with the above

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-19 Thread Sandhya Viswanathan
On Tue, 19 Dec 2023 02:22:05 GMT, Guoxiong Li wrote: >> Guoxiong Li has updated the pull request with a new target base due to a >> merge or a rebase. The incremental webrev excludes the unrelated changes >> brought in by the merge/rebase. The pull request contains four additional >> commits

Re: RFR: 8321688: Build on linux with GCC 7.5.0 fails after 8319577 [v2]

2023-12-18 Thread Sandhya Viswanathan
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote: >> Hi all, >> >> This patch fixes the building failure introduced by >> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC >> version (linux & GCC 7.5.0 locally). >> >> Thanks for the review. >> >> Best Regards, >> --

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Sandhya Viswanathan
On Wed, 6 Dec 2023 18:26:34 GMT, Vladimir Kozlov wrote: >> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this >> PR today before the fork. > >> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this >> PR today before the fork. > > Too late. Changes

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v10]

2023-12-06 Thread Sandhya Viswanathan
On Wed, 6 Dec 2023 17:48:04 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-04 Thread Sandhya Viswanathan
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v2]

2023-11-29 Thread Sandhya Viswanathan
On Tue, 28 Nov 2023 20:52:35 GMT, Srinivas Vamsi Parasa wrote: >> Thanks Sandhya, will fix this issue. > > Thanks Sandhya for suggesting the change to use supports_simd_sort(BasicType > bt). Please see the updated code upstreamed. @vamsi-parasa Thanks, your changes look good to me.

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v2]

2023-11-27 Thread Sandhya Viswanathan
On Sat, 18 Nov 2023 01:21:09 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v2]

2023-11-27 Thread Sandhya Viswanathan
On Tue, 21 Nov 2023 15:14:28 GMT, Dalibor Topic wrote: >> src/java.base/linux/native/libsimdsort/avx2-32bit-qsort.hpp line 3: >> >>> 1: /* >>> 2: * Copyright (c) 2021, 2023, Intel Corporation. All rights reserved. >>> 3: * Copyright (c) 2021 Serge Sans Paille. All rights reserved. >> >> Is

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy [v6]

2023-11-21 Thread Sandhya Viswanathan
On Tue, 21 Nov 2023 21:03:20 GMT, Steve Dohrmann wrote: >> Update: the XorTest::xor results shown in this message used test code from >> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit >> a788f066af17. The XorTest has since been updated and XorTest::copy is no >>

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy [v5]

2023-11-20 Thread Sandhya Viswanathan
On Mon, 20 Nov 2023 22:50:19 GMT, Steve Dohrmann wrote: >> Update: the XorTest::xor results shown in this message used test code from >> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit >> a788f066af17. The XorTest has since been updated and XorTest::copy is no >>

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v7]

2023-11-15 Thread Sandhya Viswanathan
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-15 Thread Sandhya Viswanathan
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan wrote: >> match_rule_supported_vector called in the beginning will enforce these >> checks. > > This method is match_rule_support_vector and it is not enforcing this check > now. It was doing so before through fall thr

Integrated: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan wrote: > Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked > as flagless through @requires vm.flagless per > [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). This pull request has now been i

Re: RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Wed, 15 Nov 2023 01:07:23 GMT, Leonid Mesnik wrote: >> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus >> marked as flagless through @requires vm.flagless per >> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). > > Marked as reviewed by lmesnik (Reviewer).

Re: RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-14 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan wrote: > Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked > as flagless through @requires vm.flagless per > [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). @lmesnik Could you plea

Re: RFR: 8310159: Bulk copy with Unsafe::arrayCopy is slower compared to memcpy

2023-11-14 Thread Sandhya Viswanathan
On Tue, 14 Nov 2023 08:09:28 GMT, Jatin Bhateja wrote: >> Below is baseline data collected using a modified version of the >> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug >> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake >> i7-1185G7,

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

2023-11-09 Thread Sandhya Viswanathan
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v5]

2023-11-09 Thread Sandhya Viswanathan
On Thu, 9 Nov 2023 18:56:19 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend implementation

RFR: 8319572: Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags

2023-11-09 Thread Sandhya Viswanathan
Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked as flagless through @requires vm.flagless per [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566). - Commit messages: - Mark LoadJsvmlTest.java test as flagless Changes:

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-06 Thread Sandhya Viswanathan
On Fri, 3 Nov 2023 22:44:39 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Restricting masked sub-word gather to AVX512 target to align with integral >> g

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-06 Thread Sandhya Viswanathan
On Sun, 5 Nov 2023 12:58:57 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606: >> >>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt, >>> XMMRegister dst, Register base, Register idx_base, >>> 1605:

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-03 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-03 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8318650: Optimized subword gather for x86 targets. [v3]

2023-11-02 Thread Sandhya Viswanathan
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and >> AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather with high performance backend

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-13 Thread Sandhya Viswanathan
On Fri, 13 Oct 2023 10:31:14 GMT, himichael wrote: >> @himichael Please refer to [this >> question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) >> for how to correctly benchmark Java code. > >> @himichael Please refer to [this >>

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 23:14:26 GMT, Vladimir Ivanov wrote: > Proposed patch has one disadvantage: there's no way to override ergonomics > decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding > the JVM. > > For many other intrinsics there are flags which enable finer

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 23:25:30 GMT, Vladimir Ivanov wrote: >> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 157: >> >>> 155: @ForceInline >>> 156: private static void sort(Class elemType, A array, long >>> offset, int low, int high, SortOperation so) { >>> 157:

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v4]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 22:25:14 GMT, Erik Joelsson wrote: >> Hi Erik (@erikj79), >> BUILD_LIBFALLBACKLINKER is from different PR (#13079). If I understand >> correctly, for LIB_SIMD_SORT, are you suggesting that we don't pad the lines >> with spaces to align features into columns and instead

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v3]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 18:31:44 GMT, Sandhya Viswanathan wrote: >> Also @forceinline in these changes only works for case when new intrinsics >> are not used. >> I would suggest to adapt/update JMH benchmark to cover all cases and see >> effect @forceinline without intri

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v3]

2023-10-11 Thread Sandhya Viswanathan
On Tue, 10 Oct 2023 22:29:55 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix whitespace in build script > > Also @forceinline in these changes only works for case when new

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 09:25:15 GMT, Andrew Haley wrote: > > Forgive me, I might be missing something very obvious, but is there any > > particular reason to entirely disable the SIMD accelerated sort on Zen 4 > > rather than having an alternate code path for Zen 4 where it has the > >

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v4]

2023-10-11 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 17:28:12 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the follow-up comments to the SIMD >> accelerated sort PR (#14227) which implemented AVX512 intrinsics for >> Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the

Re: RFR: 8314544: Matrix multiply benchmark using Vector API [v2]

2023-10-06 Thread Sandhya Viswanathan
On Fri, 6 Oct 2023 08:32:28 GMT, Martin Stypinski wrote: >> Martin Stypinski has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - changed for consistency >> - improved some RandomGenerator & unuseed Imports > > fixed typo. @Styp Thanks,

Re: RFR: 8314544: Matrix multiple benchmark using Vector API

2023-10-03 Thread Sandhya Viswanathan
On Mon, 21 Aug 2023 03:50:32 GMT, Martin Stypinski wrote: >> Added a bunch of different implementations for Vector API Matrix >> Multiplications: >> >> - Baseline >> - Blocked (Cache Local) >> - FMA >> - Vector API Simple Implementation >> - Vector API Blocked Implementation >> >> Commit was

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-09-25 Thread Sandhya Viswanathan
On Wed, 30 Aug 2023 02:01:38 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load >> library > > Good. Thank you.

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Sandhya Viswanathan
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays.

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-09-05 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-08-31 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-08-29 Thread Sandhya Viswanathan
On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Clean up parameters passed to arrayPartition; update the check to load >> library > > The changes to

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

2023-08-28 Thread Sandhya Viswanathan
On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays.

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v29]

2023-08-25 Thread Sandhya Viswanathan
On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Remove unnecessary import in Arrays.java > > After I fixed it Tier1 passed and I submitted other tiers.

Re: RFR: 8311178: JMH tests don't scale well when sharing output buffers

2023-07-10 Thread Sandhya Viswanathan
On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and > leads to poor scaling when run on multiple threads. The patch sets the scope > from benchmark level to thread level to fix the issue: > -

Re: RFR: 8310459: [BACKOUT] 8304450: [vectorapi] Refactor VectorShuffle implementation

2023-06-26 Thread Sandhya Viswanathan
On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote: > Backing out shuffle related overhaul done with > [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw > significant performance degradation in VectorAPI JMH micros and some of our > internal benchmarks. Following two

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v2]

2023-03-07 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 02:53:48 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 01:59:25 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931: >> >>> 3929: // For results consistency both intrinsics should be enabled. >>> 3930: if >>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) && >>>

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote: > Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in > Interpreter and C1 compiler to produce the same results as C2 intrinsics on > x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter

2023-03-06 Thread Sandhya Viswanathan
On Tue, 7 Mar 2023 00:52:37 GMT, Vladimir Kozlov wrote: > Note, I removed `ConvF2HFNode::Identity()` optimization because tests show > that it produces different NaN results due to skipped conversion. Yes, removing the Identity optimization is correct. It doesn't hold for NaN inputs.

  1   2   >