On Tue, 28 May 2024 23:52:27 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333:
>>
>>> 1331:
>>> 1332: __ cmpq(nMinusK, 32);
>>> 1333: __ jae_b(L_greaterThan32);
>>
>> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31
>> __
On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278:
>>
>>> 276: __ bind(L_nextCheck);
>>> 277: __ testq(haystack_len_p, haystack_len_p);
>>> 278: __ je(L_zeroCheckFailed);
>>
>> This check could be removed as the next
On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578:
>>
>>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value.
>>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop,
>>> mask,
On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 24 May 2024 23:15:26 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 22 May 2024 18:52:27 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 22 May 2024 17:40:24 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 24 May 2024 20:47:23 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 23 May 2024 23:12:42 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Fri, 17 May 2024 23:47:45 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Thu, 16 May 2024 17:08:21 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 238:
>>
>>> 236: const Register needle = rdx;
>>> 237: const Register needle_len = rcx;
>>> 238:
>>
>> This is the calling convention on Linux. How is windows
On Thu, 16 May 2024 20:22:40 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1510:
>>
>>> 1508: compare_big_haystack_to_needle(sizeKnown, size,
>>> NUMBER_OF_NEEDLE_BYTES_TO_COMPARE, loop_top, hsPtrRet, hsLength,
>>> 1509:
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sat, 4 May 2024 19:35:21 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Tue, 16 Apr 2024 00:04:15 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Mon, 15 Apr 2024 23:01:21 GMT, Jorn Vernee wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2686:
>>
>>> 2684: __ movq(rdx, rsi);
>>> 2685: restore_arg_regs();
>>> 2686: #endif
>>
>> This is stubGenerator_x86_64.cpp 64bit specific, so WIN32 portion could be
On Fri, 12 Apr 2024 16:47:58 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Mon, 15 Apr 2024 18:43:24 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Fri, 12 Apr 2024 00:10:22 GMT, Sandhya Viswanathan
wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Addressing yet more review comments
>
> src/hotspot/cpu/x86/stubGenerator_
On Fri, 12 Apr 2024 00:07:56 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Fri, 12 Apr 2024 00:00:38 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2751:
>>
>>> 2749: UnsafeSetMemoryMark usmm(this, true, true);
>>> 2750:
>>> 2751: __ generate_fill(T_BYTE, false, c_rarg0, c_rarg1, r11, rax,
>>> xmm0);
>>
On Thu, 11 Apr 2024 21:47:01 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Thu, 11 Apr 2024 20:58:00 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 735:
>>
>>> 733:
>>> 734: if (MaxVectorSize == 64) {
>>> 735: UnsafeCopyMemoryMark ucmm(this, !is_oop && !aligned, false,
>>> ucme_exit_pc);
>>
>> This is not related
On Thu, 11 Apr 2024 18:42:56 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Mon, 8 Apr 2024 19:11:19 GMT, Scott Gibbons wrote:
>> This code makes an intrinsic stub for `Unsafe::setMemory` for x86_64. See
>> [this PR](https://github.com/openjdk/jdk/pull/16760) for discussion around
>> this change.
>>
>> Overall, making this an intrinsic improves overall
On Wed, 13 Mar 2024 11:59:36 GMT, Magnus Ihse Bursie wrote:
>> As part of the ongoing effort to enable jcheck whitespace checking to all
>> text files, it is now time to address assembly (.S) files. The hotspot
>> assembly files were fixed as part of the hotspot mapfile removal, so only a
>>
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Thu, 1 Feb 2024 16:24:16 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Wed, 31 Jan 2024 21:31:21 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
On Sun, 21 Jan 2024 06:55:43 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Wed, 29 Nov 2023 15:01:32 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 19 Dec 2023 18:42:19 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote:
>> Hi all,
>>
>> This patch fixes the building failure introduced by
>> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC
>> version (linux & GCC 7.5.0 locally).
>>
>> Thanks for the review.
>>
>> Best Regards,
>> --
On Tue, 19 Dec 2023 19:08:08 GMT, Kim Barrett wrote:
>>> Have you tested with gcc 9? Or is this just supposition based on gcc9
>>> having removed the experimental
>> status for C++17?
>>
>> I have not tested GCC 8 and 9. @sviswa7 seems to test them.
>>
>>> I have verified that with the above
On Tue, 19 Dec 2023 02:22:05 GMT, Guoxiong Li wrote:
>> Guoxiong Li has updated the pull request with a new target base due to a
>> merge or a rebase. The incremental webrev excludes the unrelated changes
>> brought in by the merge/rebase. The pull request contains four additional
>> commits
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote:
>> Hi all,
>>
>> This patch fixes the building failure introduced by
>> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC
>> version (linux & GCC 7.5.0 locally).
>>
>> Thanks for the review.
>>
>> Best Regards,
>> --
On Wed, 6 Dec 2023 18:26:34 GMT, Vladimir Kozlov wrote:
>> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this
>> PR today before the fork.
>
>> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this
>> PR today before the fork.
>
> Too late. Changes
On Wed, 6 Dec 2023 17:48:04 GMT, Srinivas Vamsi Parasa wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX2 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX2 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Tue, 28 Nov 2023 20:52:35 GMT, Srinivas Vamsi Parasa
wrote:
>> Thanks Sandhya, will fix this issue.
>
> Thanks Sandhya for suggesting the change to use supports_simd_sort(BasicType
> bt). Please see the updated code upstreamed.
@vamsi-parasa Thanks, your changes look good to me.
On Sat, 18 Nov 2023 01:21:09 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX2 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Tue, 21 Nov 2023 15:14:28 GMT, Dalibor Topic wrote:
>> src/java.base/linux/native/libsimdsort/avx2-32bit-qsort.hpp line 3:
>>
>>> 1: /*
>>> 2: * Copyright (c) 2021, 2023, Intel Corporation. All rights reserved.
>>> 3: * Copyright (c) 2021 Serge Sans Paille. All rights reserved.
>>
>> Is
On Tue, 21 Nov 2023 21:03:20 GMT, Steve Dohrmann wrote:
>> Update: the XorTest::xor results shown in this message used test code from
>> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit
>> a788f066af17. The XorTest has since been updated and XorTest::copy is no
>>
On Mon, 20 Nov 2023 22:50:19 GMT, Steve Dohrmann wrote:
>> Update: the XorTest::xor results shown in this message used test code from
>> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit
>> a788f066af17. The XorTest has since been updated and XorTest::copy is no
>>
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan
wrote:
>> match_rule_supported_vector called in the beginning will enforce these
>> checks.
>
> This method is match_rule_support_vector and it is not enforcing this check
> now. It was doing so before through fall thr
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan
wrote:
> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
> as flagless through @requires vm.flagless per
> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
This pull request has now been i
On Wed, 15 Nov 2023 01:07:23 GMT, Leonid Mesnik wrote:
>> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus
>> marked as flagless through @requires vm.flagless per
>> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
>
> Marked as reviewed by lmesnik (Reviewer).
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan
wrote:
> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
> as flagless through @requires vm.flagless per
> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
@lmesnik Could you plea
On Tue, 14 Nov 2023 08:09:28 GMT, Jatin Bhateja wrote:
>> Below is baseline data collected using a modified version of the
>> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug
>> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake
>> i7-1185G7,
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler
On Thu, 9 Nov 2023 18:56:19 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
as flagless through @requires vm.flagless per
[JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
-
Commit messages:
- Mark LoadJsvmlTest.java test as flagless
Changes:
On Fri, 3 Nov 2023 22:44:39 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Restricting masked sub-word gather to AVX512 target to align with integral
>> g
On Sun, 5 Nov 2023 12:58:57 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606:
>>
>>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt,
>>> XMMRegister dst, Register base, Register idx_base,
>>> 1605:
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Fri, 13 Oct 2023 10:31:14 GMT, himichael wrote:
>> @himichael Please refer to [this
>> question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java)
>> for how to correctly benchmark Java code.
>
>> @himichael Please refer to [this
>>
On Wed, 11 Oct 2023 23:14:26 GMT, Vladimir Ivanov wrote:
> Proposed patch has one disadvantage: there's no way to override ergonomics
> decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding
> the JVM.
>
> For many other intrinsics there are flags which enable finer
On Wed, 11 Oct 2023 23:25:30 GMT, Vladimir Ivanov wrote:
>> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 157:
>>
>>> 155: @ForceInline
>>> 156: private static void sort(Class elemType, A array, long
>>> offset, int low, int high, SortOperation so) {
>>> 157:
On Wed, 11 Oct 2023 22:25:14 GMT, Erik Joelsson wrote:
>> Hi Erik (@erikj79),
>> BUILD_LIBFALLBACKLINKER is from different PR (#13079). If I understand
>> correctly, for LIB_SIMD_SORT, are you suggesting that we don't pad the lines
>> with spaces to align features into columns and instead
On Wed, 11 Oct 2023 18:31:44 GMT, Sandhya Viswanathan
wrote:
>> Also @forceinline in these changes only works for case when new intrinsics
>> are not used.
>> I would suggest to adapt/update JMH benchmark to cover all cases and see
>> effect @forceinline without intri
On Tue, 10 Oct 2023 22:29:55 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> fix whitespace in build script
>
> Also @forceinline in these changes only works for case when new
On Wed, 11 Oct 2023 09:25:15 GMT, Andrew Haley wrote:
> > Forgive me, I might be missing something very obvious, but is there any
> > particular reason to entirely disable the SIMD accelerated sort on Zen 4
> > rather than having an alternate code path for Zen 4 where it has the
> >
On Wed, 11 Oct 2023 17:28:12 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal of this PR is to address the follow-up comments to the SIMD
>> accelerated sort PR (#14227) which implemented AVX512 intrinsics for
>> Arrays.sort() methods.
>> The proposed changes are:
>>
>> 1) Restriction of the
On Fri, 6 Oct 2023 08:32:28 GMT, Martin Stypinski wrote:
>> Martin Stypinski has updated the pull request incrementally with two
>> additional commits since the last revision:
>>
>> - changed for consistency
>> - improved some RandomGenerator & unuseed Imports
>
> fixed typo.
@Styp Thanks,
On Mon, 21 Aug 2023 03:50:32 GMT, Martin Stypinski wrote:
>> Added a bunch of different implementations for Vector API Matrix
>> Multiplications:
>>
>> - Baseline
>> - Blocked (Cache Local)
>> - FMA
>> - Vector API Simple Implementation
>> - Vector API Blocked Implementation
>>
>> Commit was
On Wed, 30 Aug 2023 02:01:38 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Clean up parameters passed to arrayPartition; update the check to load
>> library
>
> Good. Thank you.
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX512 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks
On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Clean up parameters passed to arrayPartition; update the check to load
>> library
>
> The changes to
On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX512 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Remove unnecessary import in Arrays.java
>
> After I fixed it Tier1 passed and I submitted other tiers.
On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote:
> The below benchmark files have scaling issues due to cache contention and
> leads to poor scaling when run on multiple threads. The patch sets the scope
> from benchmark level to thread level to fix the issue:
> -
On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote:
> Backing out shuffle related overhaul done with
> [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw
> significant performance degradation in VectorAPI JMH micros and some of our
> internal benchmarks. Following two
On Tue, 7 Mar 2023 02:53:48 GMT, Vladimir Kozlov wrote:
>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
On Tue, 7 Mar 2023 01:59:25 GMT, Vladimir Kozlov wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931:
>>
>>> 3929: // For results consistency both intrinsics should be enabled.
>>> 3930: if
>>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) &&
>>>
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote:
> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
>
On Tue, 7 Mar 2023 00:52:37 GMT, Vladimir Kozlov wrote:
> Note, I removed `ConvF2HFNode::Identity()` optimization because tests show
> that it produces different NaN results due to skipped conversion.
Yes, removing the Identity optimization is correct. It doesn't hold for NaN
inputs.
1 - 100 of 139 matches
Mail list logo