subject:"RFR\: 8309130\: x86_64 AVX512 intrinsics for Arrays.sort methods \(int, long, float and double arrays\) \[v40\]"

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Srinivas Vamsi Parasa

On Thu, 21 Sep 2023 16:44:52 GMT, Paul Sandoz  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   change variable names of indexPivot* to pivotIndex*
>
> test/jdk/java/util/Arrays/Sorting.java line 30:
> 
>> 28:  * @build Sorting
>> 29:  * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
>> -XX:DisableIntrinsic=_arraySort,_arrayPartition, Sorting -shortrun
>> 30:  * @run main/othervm -XX:CompileThreshold=1 -XX:-TieredCompilation 
>> Sorting -shortrun
> 
> It would be useful to target the compilation threshold only for the intrinsic 
> methods, but i suspect that is not possible?

Hi Paul,

Did some further study and found that it's possible to scale the compilation 
threshold only for the intrinsic methods (`sort `and `partition`)  using the 
`CompileCommand` and `CompileThersholdScaling` 

 ``` 
-XX:-TieredCompilation 
-XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::sort,0.0001
  
-XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::partition,0.0001


Will push this change along with build script change suggested by Magnus.

Thanks,
Vamsi

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1333797193

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Srinivas Vamsi Parasa

On Thu, 21 Sep 2023 09:32:18 GMT, Magnus Ihse Bursie  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   change variable names of indexPivot* to pivotIndex*
>
> make/modules/java.base/Lib.gmk line 240:
> 
>> 238: 
>> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, 
>> x86_64)+$(INCLUDE_COMPILER2), true+true+true)
>> 240:   ifeq ($(TOOLCHAIN_TYPE), gcc)
> 
> This requirement can be folded into the "and" sequence in the `ifeq` 
> statement above. There is nothing that really merits it to be specified 
> separately.

Hello Magnus (@magicus),

As suggested, made the modification to use a single `ifeq` (which works for me 
on a Linux machine). Could you please confirm if the approach below is correct? 

`ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, 
x86_64)+$(INCLUDE_COMPILER2)+$(filter $(TOOLCHAIN_TYPE), gcc), 
true+true+true+gcc)`

Thanks,
Vamsi

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1333450204

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Paul Sandoz

On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

The Java changes look good to me, it slots in very neatly with the same 
space/memory characteristics. It's generally pleasing to me to see recent data 
parallel sorting techniques from academic literature applied in the Java 
platform.

I wish it were easier to support different OS platforms (e.g., MacOS with x86) 
but its a compromise on complexity of the sort implementations. 

I wonder if it makes sense to adjust the insertion sort thresholds when the 
intrinsics are enabled. That could be revisited later.

-

Marked as reviewed by psandoz (Reviewer).

PR Review:

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Paul Sandoz

On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

test/jdk/java/util/Arrays/Sorting.java line 30:

> 28:  * @build Sorting
> 29:  * @run main/othervm -XX:+UnlockDiagnosticVMOptions 
> -XX:DisableIntrinsic=_arraySort,_arrayPartition, Sorting -shortrun
> 30:  * @run main/othervm -XX:CompileThreshold=1 -XX:-TieredCompilation 
> Sorting -shortrun

It would be useful to target the compilation threshold only for the intrinsic 
methods, but i suspect that is not possible?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r151101

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread Magnus Ihse Bursie

On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

make/modules/java.base/Lib.gmk line 240:

> 238: 
> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, 
> x86_64)+$(INCLUDE_COMPILER2), true+true+true)
> 240:   ifeq ($(TOOLCHAIN_TYPE), gcc)

This requirement can be folded into the "and" sequence in the `ifeq` statement 
above. There is nothing that really merits it to be specified separately.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1332770004

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-21 Thread iaroslavski

On Wed, 20 Sep 2023 22:46:16 GMT, Srinivas Vamsi Parasa  
wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   change variable names of indexPivot* to pivotIndex*
>
> Hi Vladimir,
> 
> Just trying to understand: is there a reason to use 
> `DualPivotQuicksort_RadixForParallel.java` and 
> `DualPivotQuicksort_RadixForAll.java`?
> 
> Would it not be sufficient to do the following two runs:
> 
> 1. Baseline (Stock JDK) vs. AVX512 sort for` sort() `and `parallelSort()` ?
> 2. AVX512 sort vs. Radix sort for `sort()` and `parallelSort()` ?
> 
>> [1] current implementation in JDK [2] your AVX12 version based on [1], from 
>> this PR [3] my new version with Radix sort for parallel case plus your AVX12 
>> changes [4] my new version with Radix sort for all cases plus your AVX12 
>> changes 
> 
> Thanks,
> Vamsi

Hi @vamsi-parasa , @sviswa7 and @jatin-bhateja !

I agree with you that bot PRs are independent and radix sort discussion should 
not be here, my fault. I asked to run benchmarking because actual AVX512 
changes are in C++ code and I don't have ready environment for testing.
I'm moving this discussion to PR https://github.com/openjdk/jdk/pull/13568.

Many thanks,
Vladimir

-

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1728996248

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Jatin Bhateja

On Wed, 20 Sep 2023 22:46:16 GMT, Srinivas Vamsi Parasa  
wrote:

> Hi Vladimir,
> 
> Just trying to understand: is there a reason to use 
> `DualPivotQuicksort_RadixForParallel.java` and 
> `DualPivotQuicksort_RadixForAll.java`?
> 
> Would it not be sufficient to do the following two runs:
> 
> 1. Baseline (Stock JDK) vs. AVX512 sort for`sort()`and `parallelSort()` ?
> 2. AVX512 sort vs. Radix sort for `sort()` and `parallelSort()` ?
> 
> > [1] current implementation in JDK [2] your AVX12 version based on [1], from 
> > this PR [3] my new version with Radix sort for parallel case plus your 
> > AVX12 changes [4] my new version with Radix sort for all cases plus your 
> > AVX12 changes
> 
> Thanks, Vamsi

Hi @vamsi-parasa , Please also add C1 intrinsification support for newly carved 
sort / partition intrinsics in your follow up PR 
https://bugs.openjdk.org/browse/JDK-8315382 , it may further improve the 
overall runtime for large sized array sort in tiered compilation mode.

-

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1728877814

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Sandhya Viswanathan

On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

> Hi Vamsi,
> 
> In this comment [#13568 
> (comment)](https://github.com/openjdk/jdk/pull/13568#issuecomment-1728082819) 
> Paul suggested comparing of performance.
> 
> Could you please run benchmarking of the following 4 class?
> 
> [1] current implementation in JDK [2] your AVX12 version based on [1], from 
> this PR [3] my new version with Radix sort for parallel case plus your AVX12 
> changes 
> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_RadixForParallel.java
>  [4] my new version with Radix sort for all cases plus your AVX12 changes 
>

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Srinivas Vamsi Parasa

On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

Hi Vladimir,

Sure, will run the benchmarks and post the JMH performance data.
Just trying to understand: is there a reason to use 
`DualPivotQuicksort_RadixForParallel.java` and 
`DualPivotQuicksort_RadixForAll.java`?

Would it not be sufficient to do the following two runs:

1. Baseline (Stock JDK) vs. AVX512 sort for` sort() `and `parallelSort()` ?
2. AVX512 sort vs. Radix sort for `sort()` and `parallelSort(`) ?

> [1] current implementation in JDK [2] your AVX12 version based on [1], from 
> this PR [3] my new version with Radix sort for parallel case plus your AVX12 
>

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread iaroslavski

On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa  
wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |Arrays.sort benchmark   |   Array Size  |   Baseline 
>> (us/op)|   AVX512 Sort (us/op) |   Speedup |
>> |--- |   --- |   --- |   --- |   --- 
>> |
>> |ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
>> |   1.3 |
>> |ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
>> |   1.0 |
>> |ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
>> |   1.5 |
>> |ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
>> |   **4.5** |
>> |ArraysSort.doubleSort   |   10  |   4471.871|   
>> 1002.748|   **4.5** |
>> |ArraysSort.doubleSort   |   100 |   51660.742   |   
>> 12921.295   |   **4.0** |
>> |ArraysSort.floatSort|   10  |   0.045   |   0.046   
>> |   1.0 |
>> |ArraysSort.floatSort|   25  |   0.103   |   0.084   
>> |   1.2 |
>> |ArraysSort.floatSort|   50  |   0.285   |   0.33
>> |   0.9 |
>> |ArraysSort.floatSort|   75  |   0.492   |   0.346   
>> |   1.4 |
>> |ArraysSort.floatSort|   100 |   0.597   |   0.326   
>> |   1.8 |
>> |ArraysSort.floatSort|   1000|   9.811   |   5.294   
>> |   1.9 |
>> |ArraysSort.floatSort|   1   |   323.955 |   50.547  
>> |   **6.4** |
>> |ArraysSort.floatSort|   10  |   4326.38 |   731.152 
>> |   **5.9** |
>> |ArraysSort.floatSort|   100 |   52413.88|   
>> 8409.193|   **6.2** |
>> |ArraysSort.intSort  |   10  |   0.033   |   0.033   
>> |   1.0 |
>> |ArraysSort.intSort  |   25  |   0.086   |   0.051   
>> |   1.7 |
>> |ArraysSort.intSort  |   50  |   0.236   |   0.151   
>> |   1.6 |
>> |ArraysSort.intSort  |   75  |   0.416   |   0.332   
>> |   1.3 |
>> |ArraysSort.intSort  |   100 |   0.63|   0.521   
>> |   1.2 |
>> |ArraysSort.intSort  |   1000|   10.518  |   4.698   
>> |   2.2 |
>> |ArraysSort.intSort  |   1   |   309.659 |   42.518  
>> |   **7.3** |
>> |ArraysSort.intSort  |   10  |   4130.917|   
>> 573.956 |   **7.2** |
>> |ArraysSort.intSort  |   100 |   49876.307   |   
>> 6712.812|   **7.4** |
>> |ArraysSort.longSort |   10  |   0.036   |   0.037   
>> |   1.0 |
>> |ArraysSort.longSort |   25  |   0.094   |   0.08
>> |   1.2 |
>> |ArraysSort.longSort |   50  |   0.218   |   0.227   
>> |   1.0 |
>> |ArraysSort.longSort |   75  |   0.466   |   0.402   
>> |   1.2 |
>> |ArraysSort.longSort |   100 |   0.76|   0.58
>> |   1.3 |
>> |ArraysSort.longSort |   1000|   10.449  |   6
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   change variable names of indexPivot* to pivotIndex*

Hi Vamsi,

In this comment 
https://github.com/openjdk/jdk/pull/13568#issuecomment-1728082819
Paul suggested comparing of performance.

Could you please run benchmarking of the following 4 class?

[1] current implementation in JDK
[2] your AVX12 version based on [1], from this PR
[3] my new version with Radix sort for parallel case plus your AVX12 changes
 
https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_RadixForParallel.java
[4] my new version with Radix sort for all cases plus your AVX12 changes

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

2023-09-20 Thread Srinivas Vamsi Parasa

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX512 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto 
> ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
> performance data below.
> 
> 
> **Arrays.sort performance data using JMH benchmarks for arrays with random 
> data** 
> 
> | Arrays.sort benchmark   |   Array Size  |   Baseline 
> (us/op)|   AVX512 Sort (us/op) |   Speedup |
> | --- |   --- |   --- |   --- |   --- 
> |
> | ArraysSort.doubleSort   |   10  |   0.034   |   0.035   
> |   1.0 |
> | ArraysSort.doubleSort   |   25  |   0.116   |   0.089   
> |   1.3 |
> | ArraysSort.doubleSort   |   50  |   0.282   |   0.291   
> |   1.0 |
> | ArraysSort.doubleSort   |   75  |   0.474   |   0.358   
> |   1.3 |
> | ArraysSort.doubleSort   |   100 |   0.654   |   0.623   
> |   1.0 |
> | ArraysSort.doubleSort   |   1000|   9.274   |   6.331   
> |   1.5 |
> | ArraysSort.doubleSort   |   1   |   323.339 |   71.228  
> |   **4.5** |
> | ArraysSort.doubleSort   |   10  |   4471.871|   
> 1002.748|   **4.5** |
> | ArraysSort.doubleSort   |   100 |   51660.742   |   
> 12921.295   |   **4.0** |
> | ArraysSort.floatSort|   10  |   0.045   |   0.046   
> |   1.0 |
> | ArraysSort.floatSort|   25  |   0.103   |   0.084   
> |   1.2 |
> | ArraysSort.floatSort|   50  |   0.285   |   0.33
> |   0.9 |
> | ArraysSort.floatSort|   75  |   0.492   |   0.346   
> |   1.4 |
> | ArraysSort.floatSort|   100 |   0.597   |   0.326   
> |   1.8 |
> | ArraysSort.floatSort|   1000|   9.811   |   5.294   
> |   1.9 |
> | ArraysSort.floatSort|   1   |   323.955 |   50.547  
> |   **6.4** |
> | ArraysSort.floatSort|   10  |   4326.38 |   731.152 
> |   **5.9** |
> | ArraysSort.floatSort|   100 |   52413.88|   
> 8409.193|   **6.2** |
> | ArraysSort.intSort  |   10  |   0.033   |   0.033   
> |   1.0 |
> | ArraysSort.intSort  |   25  |   0.086   |   0.051   
> |   1.7 |
> | ArraysSort.intSort  |   50  |   0.236   |   0.151   
> |   1.6 |
> | ArraysSort.intSort  |   75  |   0.416   |   0.332   
> |   1.3 |
> | ArraysSort.intSort  |   100 |   0.63|   0.521   
> |   1.2 |
> | ArraysSort.intSort  |   1000|   10.518  |   4.698   
> |   2.2 |
> | ArraysSort.intSort  |   1   |   309.659 |   42.518  
> |   **7.3** |
> | ArraysSort.intSort  |   10  |   4130.917|   
> 573.956 |   **7.2** |
> | ArraysSort.intSort  |   100 |   49876.307   |   
> 6712.812|   **7.4** |
> | ArraysSort.longSort |   10  |   0.036   |   0.037   
> |   1.0 |
> | ArraysSort.longSort |   25  |   0.094   |   0.08
> |   1.2 |
> | ArraysSort.longSort |   50  |   0.218   |   0.227   
> |   1.0 |
> | ArraysSort.longSort |   75  |   0.466   |   0.402   
> |   1.2 |
> | ArraysSort.longSort |   100 |   0.76|   0.58
> |   1.3 |
> | ArraysSort.longSort |   1000|   10.449  |   6.239   
> |   1.7 |
> | ArraysSort.longSort |   1   |   307.074 |   70.284  
> |   **4.4** |
> | ArraysSor...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  change variable names of indexPivot* to pivotIndex*

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/14227/files
  - new: https://git.openjdk.org/jdk/pull/14227/files/3e0b8cfc..b04cb6c3

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=14227=39
 - incr: https://webrevs.openjdk.org/?repo=jdk=14227=38-39

  Stats: 49 lines in 1 file changed: 0 ins; 6 del; 43 mod
  Patch: https://git.openjdk.org/jdk/pull/14227.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227

PR: https://git.openjdk.org/jdk/pull/14227

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]

11 matches

Site Navigation

Mail list logo

Footer information