On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks
On Tue, 29 Aug 2023 19:28:17 GMT, Alan Bateman wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Clean up parameters passed to arrayPartition; update the check to load
>> library
>
> The changes to
On Tue, 10 Oct 2023 22:29:55 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> fix whitespace in build script
>
> Also @forceinline in these changes only works for case when new
On Wed, 11 Oct 2023 09:25:15 GMT, Andrew Haley wrote:
> > Forgive me, I might be missing something very obvious, but is there any
> > particular reason to entirely disable the SIMD accelerated sort on Zen 4
> > rather than having an alternate code path for Zen 4 where it has the
> >
On Wed, 11 Oct 2023 17:28:12 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal of this PR is to address the follow-up comments to the SIMD
>> accelerated sort PR (#14227) which implemented AVX512 intrinsics for
>> Arrays.sort() methods.
>> The proposed changes are:
>>
>> 1) Restriction of the
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Tue, 31 Oct 2023 07:19:55 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Fri, 3 Nov 2023 22:44:39 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Restricting masked sub-word gather to AVX512 target to align with integral
>> g
On Sun, 5 Nov 2023 12:58:57 GMT, Jatin Bhateja wrote:
>> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1606:
>>
>>> 1604: void C2_MacroAssembler::vpgather8b_offset(BasicType elem_bt,
>>> XMMRegister dst, Register base, Register idx_base,
>>> 1605:
On Wed, 30 Aug 2023 02:01:38 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Clean up parameters passed to arrayPartition; update the check to load
>> library
>
> Good. Thank you.
On Mon, 21 Aug 2023 03:50:32 GMT, Martin Stypinski wrote:
>> Added a bunch of different implementations for Vector API Matrix
>> Multiplications:
>>
>> - Baseline
>> - Blocked (Cache Local)
>> - FMA
>> - Vector API Simple Implementation
>> - Vector API Blocked Implementation
>>
>> Commit was
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX512 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
On Wed, 11 Oct 2023 23:14:26 GMT, Vladimir Ivanov wrote:
> Proposed patch has one disadvantage: there's no way to override ergonomics
> decisions on AMD CPUs and forcibly enable the intrinsic without rebuilding
> the JVM.
>
> For many other intrinsics there are flags which enable finer
On Wed, 11 Oct 2023 23:25:30 GMT, Vladimir Ivanov wrote:
>> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 157:
>>
>>> 155: @ForceInline
>>> 156: private static void sort(Class elemType, A array, long
>>> offset, int low, int high, SortOperation so) {
>>> 157:
On Wed, 11 Oct 2023 22:25:14 GMT, Erik Joelsson wrote:
>> Hi Erik (@erikj79),
>> BUILD_LIBFALLBACKLINKER is from different PR (#13079). If I understand
>> correctly, for LIB_SIMD_SORT, are you suggesting that we don't pad the lines
>> with spaces to align features into columns and instead
On Wed, 11 Oct 2023 18:31:44 GMT, Sandhya Viswanathan
wrote:
>> Also @forceinline in these changes only works for case when new intrinsics
>> are not used.
>> I would suggest to adapt/update JMH benchmark to cover all cases and see
>> effect @forceinline without intri
On Fri, 6 Oct 2023 08:32:28 GMT, Martin Stypinski wrote:
>> Martin Stypinski has updated the pull request incrementally with two
>> additional commits since the last revision:
>>
>> - changed for consistency
>> - improved some RandomGenerator & unuseed Imports
>
> fixed typo.
@Styp Thanks,
On Fri, 13 Oct 2023 10:31:14 GMT, himichael wrote:
>> @himichael Please refer to [this
>> question](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java)
>> for how to correctly benchmark Java code.
>
>> @himichael Please refer to [this
>>
On Fri, 25 Aug 2023 18:46:53 GMT, Vladimir Kozlov wrote:
>> Srinivas Vamsi Parasa has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Remove unnecessary import in Arrays.java
>
> After I fixed it Tier1 passed and I submitted other tiers.
On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX512 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
On Fri, 10 Nov 2023 01:25:49 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler
Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
as flagless through @requires vm.flagless per
[JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
-
Commit messages:
- Mark LoadJsvmlTest.java test as flagless
Changes:
On Thu, 9 Nov 2023 18:56:19 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend implementation
On Wed, 24 Aug 2022 23:48:36 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Updated
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Addressed
On Thu, 1 Sep 2022 18:31:07 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Addressed
On Thu, 1 Sep 2022 18:26:52 GMT, Smita Kamath wrote:
>> src/hotspot/cpu/x86/x86_64.ad line 11330:
>>
>>> 11328: ins_pipe( pipe_slow );
>>> 11329: %}
>>> 11330:
>>
>> For HF2F, good to also add optimized rule with LoadS to benefit from
>> vcvtph2ps memory src form of instruction.
>>
On Thu, 1 Sep 2022 23:22:46 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Added missing
On Fri, 2 Sep 2022 00:52:49 GMT, Smita Kamath wrote:
>> 8289552: Make intrinsic conversions between bit representations of half
>> precision values and floats
>
> Smita Kamath has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Addressed
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote:
>> @vnkozlov I have addressed all review comments. Could you please run the
>> patch through your testing? Thanks a lot for all the help.
>
> @smita-kamath I have builds failures. Please, build and test yourself to
> verify changes.
>
>
On Thu, 29 Sep 2022 18:34:41 GMT, Vladimir Kozlov wrote:
>> @vnkozlov I have addressed all review comments. Could you please run the
>> patch through your testing? Thanks a lot for all the help.
>
> @smita-kamath I have builds failures. Please, build and test yourself to
> verify changes.
>
>
On Fri, 5 Aug 2022 23:58:49 GMT, Joe Darcy wrote:
>> @jddarcy Thanks for your comment. I am not sure if there is a way of using
>> Java library implementation here.
>
> I was under the impression that if a platform didn't have special support for
> the functionality in question it could not
On Fri, 5 Aug 2022 16:36:23 GMT, Smita Kamath wrote:
> 8289552: Make intrinsic conversions between bit representations of half
> precision values and floats
src/hotspot/cpu/x86/assembler_x86.cpp line 1927:
> 1925: assert(VM_Version::supports_evex(), "");
> 1926: InstructionAttr
On Thu, 22 Dec 2022 13:10:02 GMT, Claes Redestad wrote:
>> @cl4es Thanks for passing the constant node through, the code looks much
>> cleaner now. The attached patch should handle the signed bytes/shorts as
>> well. Please take a look.
>>
On Mon, 9 Jan 2023 23:13:29 GMT, Claes Redestad wrote:
>> Claes Redestad has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Explicitly lea external address
>
> Explicitly loading the address to a register seems to do the trick, avoiding
>
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Sun, 13 Nov 2022 20:57:44 GMT, Claes Redestad wrote:
>> src/hotspot/cpu/x86/x86_64.ad line 12073:
>>
>>> 12071: legRegD tmp_vec13, rRegI tmp1, rRegI tmp2,
>>> rRegI tmp3, rFlagsReg cr)
>>> 12072: %{
>>> 12073: predicate(UseAVX >= 2 &&
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Tue, 20 Dec 2022 21:11:40 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Tue, 20 Dec 2022 19:52:34 GMT, Claes Redestad wrote:
>> src/java.base/share/classes/java/lang/StringUTF16.java line 418:
>>
>>> 416: return 0;
>>> 417: } else {
>>> 418: return ArraysSupport.vectorizedHashCode(value,
>>> ArraysSupport.UTF16);
>>
>> Special
On Tue, 20 Dec 2022 21:11:18 GMT, Claes Redestad wrote:
>>> How far off is this ...?
>>
>> Back then it looked way too constrained (tight constraints on code shapes).
>> But I considered it as a generally applicable optimization.
>>
>>> ... do you think it'll be able to match the efficiency
On Wed, 21 Dec 2022 17:29:23 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote:
>> Continuing the work initiated by @luhenry to unroll and then intrinsify
>> polynomial hash loops.
>>
>> I've rewired the library changes to route via a single `@IntrinsicCandidate`
>> method. To make this work I've harmonized how they
On Wed, 25 Jan 2023 15:03:05 GMT, Scott Gibbons wrote:
> Adding a performance benchmark test for CRC32. This does exactly the same
> test as for CRC32C.
test/micro/org/openjdk/bench/java/util/TestCRC32.java line 2:
> 1: /*
> 2: * Copyright (c) 2021, 2022, 2023, Oracle and/or its affiliates.
On Wed, 25 Jan 2023 23:07:49 GMT, Scott Gibbons wrote:
>> Adding a performance benchmark test for CRC32. This does exactly the same
>> test as for CRC32C.
>
> Scott Gibbons has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Fix copyright
On Mon, 6 Mar 2023 23:54:44 GMT, Vladimir Kozlov wrote:
>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
On Tue, 7 Mar 2023 00:52:37 GMT, Vladimir Kozlov wrote:
> Note, I removed `ConvF2HFNode::Identity()` optimization because tests show
> that it produces different NaN results due to skipped conversion.
Yes, removing the Identity optimization is correct. It doesn't hold for NaN
inputs.
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote:
> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
>
On Tue, 7 Mar 2023 01:59:25 GMT, Vladimir Kozlov wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3931:
>>
>>> 3929: // For results consistency both intrinsics should be enabled.
>>> 3930: if
>>> (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_float16ToFloat) &&
>>>
On Fri, 3 Mar 2023 21:41:35 GMT, Vladimir Kozlov wrote:
> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
>
On Tue, 7 Mar 2023 02:53:48 GMT, Vladimir Kozlov wrote:
>> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in
>> Interpreter and C1 compiler to produce the same results as C2 intrinsics on
>> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java
On Wed, 22 Feb 2023 04:03:02 GMT, David Holmes wrote:
>> Change the java/lang/float.java and the corresponding shared runtime
>> constant expression evaluation to generate QNaN.
>> The HW instructions generate QNaNs and not SNaNs for floating point
>> instructions. This happens across double,
On Wed, 22 Feb 2023 21:21:42 GMT, Vladimir Kozlov wrote:
>>> I'm also a bit concerned that we are rushing in to "fix" this. IIUC we have
>>> three mechanisms for implementing this functionality:
>>>
>>> 1. The interpreted Java code
>>>
>>> 2. The compiled non-intrinisc sharedRuntime
On Tue, 28 Feb 2023 15:59:26 GMT, Eirik Bjorsnos wrote:
> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set of
> benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark serves
> as an example of how vectorization can be useful also in the area of text
>
On Tue, 28 Feb 2023 23:08:29 GMT, Eirik Bjorsnos wrote:
>> This PR suggests we add a vectorized equalsIgnoreCase benchmark to the set
>> of benchmarks in `org.openjdk.bench.jdk.incubator.vector`. This benchmark
>> serves as an example of how vectorization can be useful also in the area of
>>
Change the java/lang/float.java and the corresponding shared runtime constant
expression evaluation to generate QNaN.
The HW instructions generate QNaNs and not SNaNs for floating point
instructions. This happens across double, float, and float16 data types. The
most significant bit of mantissa
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan
wrote:
> Change the java/lang/float.java and the corresponding shared runtime constant
> expression evaluation to generate QNaN.
> The HW instructions generate QNaNs and not SNaNs for floating point
> instructions. This ha
On Wed, 22 Feb 2023 02:08:27 GMT, Sandhya Viswanathan
wrote:
> Change the java/lang/float.java and the corresponding shared runtime constant
> expression evaluation to generate QNaN.
> The HW instructions generate QNaNs and not SNaNs for floating point
> instructions. This ha
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
On Tue, 7 Feb 2023 00:12:21 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
On Tue, 7 Feb 2023 02:49:44 GMT, Sandhya Viswanathan
wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Add algorithm comments
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp l
On Thu, 9 Feb 2023 18:08:15 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
On Tue, 14 Feb 2023 22:41:47 GMT, Claes Redestad wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Last of review comments
>
> I've started tier1-5 testing internally. Will let you know if we find any
> issues.
On Tue, 14 Feb 2023 15:03:49 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2658:
>>
>>> 2656: // Check for buffer too small (for algorithm)
>>> 2657: __ subl(length, 0x2c);
>>> 2658: __ jcc(Assembler::lessEqual, L_tailProc);
>>
>> This could be
On Tue, 14 Feb 2023 15:19:34 GMT, Claes Redestad wrote:
>> Why? There is no performance difference and the intent is clear. Is this
>> just a "style" thing?
>
> I think with `lessEqual` we'll jump to `L_tailProc` for the final 32-byte
> chunk in inputs that are divisible by 32 (starting from
On Tue, 14 Feb 2023 18:22:32 GMT, Scott Gibbons wrote:
>> Added code for Base64 acceleration (encode and decode) which will accelerate
>> ~4x for AVX2 platforms.
>>
>> Encode performance:
>> **Old:**
>>
>> Benchmark (maxNumBytes) Mode Cnt Score Error
>> Units
On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote:
> The below benchmark files have scaling issues due to cache contention and
> leads to poor scaling when run on multiple threads. The patch sets the scope
> from benchmark level to thread level to fix the issue:
> -
On Fri, 23 Jun 2023 16:43:32 GMT, Jatin Bhateja wrote:
> Backing out shuffle related overhaul done with
> [JDK-8304450](https://bugs.openjdk.org/browse/JDK-8304450), we saw
> significant performance degradation in VectorAPI JMH micros and some of our
> internal benchmarks. Following two
On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Wed, 7 Feb 2024 18:38:29 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote:
>> Hi all,
>>
>> This patch fixes the building failure introduced by
>> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC
>> version (linux & GCC 7.5.0 locally).
>>
>> Thanks for the review.
>>
>> Best Regards,
>> --
On Tue, 19 Dec 2023 19:08:08 GMT, Kim Barrett wrote:
>>> Have you tested with gcc 9? Or is this just supposition based on gcc9
>>> having removed the experimental
>> status for C++17?
>>
>> I have not tested GCC 8 and 9. @sviswa7 seems to test them.
>>
>>> I have verified that with the above
On Sun, 17 Dec 2023 13:25:00 GMT, Guoxiong Li wrote:
>> Hi all,
>>
>> This patch fixes the building failure introduced by
>> [JDK-8319577](https://bugs.openjdk.org/browse/JDK-8319577) in old GCC
>> version (linux & GCC 7.5.0 locally).
>>
>> Thanks for the review.
>>
>> Best Regards,
>> --
On Tue, 19 Dec 2023 02:22:05 GMT, Guoxiong Li wrote:
>> Guoxiong Li has updated the pull request with a new target base due to a
>> merge or a rebase. The incremental webrev excludes the unrelated changes
>> brought in by the merge/rebase. The pull request contains four additional
>> commits
On Tue, 19 Dec 2023 18:42:19 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Wed, 29 Nov 2023 15:01:32 GMT, Scott Gibbons wrote:
>> Re-write the IndexOf code without the use of the pcmpestri instruction, only
>> using AVX2 instructions. This change accelerates String.IndexOf on average
>> 1.3x for AVX2. The benchmark numbers:
>>
>>
>> Benchmark
On Tue, 28 Nov 2023 20:52:35 GMT, Srinivas Vamsi Parasa
wrote:
>> Thanks Sandhya, will fix this issue.
>
> Thanks Sandhya for suggesting the change to use supports_simd_sort(BasicType
> bt). Please see the updated code upstreamed.
@vamsi-parasa Thanks, your changes look good to me.
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX2 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Sat, 18 Nov 2023 01:21:09 GMT, Srinivas Vamsi Parasa
wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX2 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Tue, 21 Nov 2023 15:14:28 GMT, Dalibor Topic wrote:
>> src/java.base/linux/native/libsimdsort/avx2-32bit-qsort.hpp line 3:
>>
>>> 1: /*
>>> 2: * Copyright (c) 2021, 2023, Intel Corporation. All rights reserved.
>>> 3: * Copyright (c) 2021 Serge Sans Paille. All rights reserved.
>>
>> Is
On Wed, 6 Dec 2023 17:48:04 GMT, Srinivas Vamsi Parasa wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking
>> advantage of AVX2 instructions. This enhancement provides an order of
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>>
On Wed, 6 Dec 2023 18:26:34 GMT, Vladimir Kozlov wrote:
>> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this
>> PR today before the fork.
>
>> @TobiHartmann @vnkozlov Please advice if we can go head and integrate this
>> PR today before the fork.
>
> Too late. Changes
On Sun, 21 Jan 2024 06:55:43 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Wed, 31 Jan 2024 21:31:21 GMT, Sandhya Viswanathan
wrote:
>> Jatin Bhateja has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Review comments resolutions.
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
On Thu, 1 Feb 2024 16:24:16 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather using hybrid algorithm which initially
>>
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote:
>> Hi,
>>
>> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2
>> only targets.
>> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2
>> instruction set.
>> These are very frequently used APIs
On Mon, 6 Nov 2023 18:37:41 GMT, Sandhya Viswanathan
wrote:
>> match_rule_supported_vector called in the beginning will enforce these
>> checks.
>
> This method is match_rule_support_vector and it is not enforcing this check
> now. It was doing so before through fall thr
On Wed, 15 Nov 2023 02:17:58 GMT, Jatin Bhateja wrote:
>> Hi All,
>>
>> This patch optimizes sub-word gather operation for x86 targets with AVX2 and
>> AVX512 features.
>>
>> Following is the summary of changes:-
>>
>> 1) Intrinsify sub-word gather with high performance backend
On Tue, 14 Nov 2023 08:09:28 GMT, Jatin Bhateja wrote:
>> Below is baseline data collected using a modified version of the
>> java.lang.foreign.xor micro benchmark referenced by @mcimadamore in the bug
>> report. I collected data on an Ubuntu 22.04 laptop with a Tigerlake
>> i7-1185G7,
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan
wrote:
> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
> as flagless through @requires vm.flagless per
> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
@lmesnik Could you plea
On Wed, 15 Nov 2023 01:07:23 GMT, Leonid Mesnik wrote:
>> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus
>> marked as flagless through @requires vm.flagless per
>> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
>
> Marked as reviewed by lmesnik (Reviewer).
On Thu, 9 Nov 2023 22:08:06 GMT, Sandhya Viswanathan
wrote:
> Test jdk/incubator/vector/LoadJsvmlTest.java ignores VM flags and thus marked
> as flagless through @requires vm.flagless per
> [JDK-8319566](https://bugs.openjdk.org/browse/JDK-8319566).
This pull request has now been i
On Mon, 20 Nov 2023 22:50:19 GMT, Steve Dohrmann wrote:
>> Update: the XorTest::xor results shown in this message used test code from
>> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit
>> a788f066af17. The XorTest has since been updated and XorTest::copy is no
>>
On Tue, 21 Nov 2023 21:03:20 GMT, Steve Dohrmann wrote:
>> Update: the XorTest::xor results shown in this message used test code from
>> PR commit 7cc272e862791 which was based on Maurizio Cimadamore's commit
>> a788f066af17. The XorTest has since been updated and XorTest::copy is no
>>
On Fri, 12 Apr 2024 00:10:22 GMT, Sandhya Viswanathan
wrote:
>> Scott Gibbons has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Addressing yet more review comments
>
> src/hotspot/cpu/x86/stubGenerator_
On Fri, 12 Apr 2024 00:00:38 GMT, Scott Gibbons wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2751:
>>
>>> 2749: UnsafeSetMemoryMark usmm(this, true, true);
>>> 2750:
>>> 2751: __ generate_fill(T_BYTE, false, c_rarg0, c_rarg1, r11, rax,
>>> xmm0);
>>
1 - 100 of 139 matches
Mail list logo