Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Thu, 21 Sep 2023 16:44:52 GMT, Paul Sandoz wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> change variable names of indexPivot* to pivotIndex* > > test/jdk/java/util/Arrays/Sorting.java line 30: > >> 28: * @build Sorting >> 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions >> -XX:DisableIntrinsic=_arraySort,_arrayPartition, Sorting -shortrun >> 30: * @run main/othervm -XX:CompileThreshold=1 -XX:-TieredCompilation >> Sorting -shortrun > > It would be useful to target the compilation threshold only for the intrinsic > methods, but i suspect that is not possible? Hi Paul, Did some further study and found that it's possible to scale the compilation threshold only for the intrinsic methods (`sort `and `partition`) using the `CompileCommand` and `CompileThersholdScaling` ``` -XX:-TieredCompilation -XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::sort,0.0001 -XX:CompileCommand=CompileThresholdScaling,java.util.DualPivotQuicksort::partition,0.0001 Will push this change along with build script change suggested by Magnus. Thanks, Vamsi - PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1333797193
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Thu, 21 Sep 2023 09:32:18 GMT, Magnus Ihse Bursie wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> change variable names of indexPivot* to pivotIndex* > > make/modules/java.base/Lib.gmk line 240: > >> 238: >> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, >> x86_64)+$(INCLUDE_COMPILER2), true+true+true) >> 240: ifeq ($(TOOLCHAIN_TYPE), gcc) > > This requirement can be folded into the "and" sequence in the `ifeq` > statement above. There is nothing that really merits it to be specified > separately. Hello Magnus (@magicus), As suggested, made the modification to use a single `ifeq` (which works for me on a Linux machine). Could you please confirm if the approach below is correct? `ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2)+$(filter $(TOOLCHAIN_TYPE), gcc), true+true+true+gcc)` Thanks, Vamsi - PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1333450204
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* The Java changes look good to me, it slots in very neatly with the same space/memory characteristics. It's generally pleasing to me to see recent data parallel sorting techniques from academic literature applied in the Java platform. I wish it were easier to support different OS platforms (e.g., MacOS with x86) but its a compromise on complexity of the sort implementations. I wonder if it makes sense to adjust the insertion sort thresholds when the intrinsics are enabled. That could be revisited later. - Marked as reviewed by psandoz (Reviewer). PR Review:
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* test/jdk/java/util/Arrays/Sorting.java line 30: > 28: * @build Sorting > 29: * @run main/othervm -XX:+UnlockDiagnosticVMOptions > -XX:DisableIntrinsic=_arraySort,_arrayPartition, Sorting -shortrun > 30: * @run main/othervm -XX:CompileThreshold=1 -XX:-TieredCompilation > Sorting -shortrun It would be useful to target the compilation threshold only for the intrinsic methods, but i suspect that is not possible? - PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r151101
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* make/modules/java.base/Lib.gmk line 240: > 238: > 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, > x86_64)+$(INCLUDE_COMPILER2), true+true+true) > 240: ifeq ($(TOOLCHAIN_TYPE), gcc) This requirement can be folded into the "and" sequence in the `ifeq` statement above. There is nothing that really merits it to be specified separately. - PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1332770004
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 22:46:16 GMT, Srinivas Vamsi Parasa wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one >> additional commit since the last revision: >> >> change variable names of indexPivot* to pivotIndex* > > Hi Vladimir, > > Just trying to understand: is there a reason to use > `DualPivotQuicksort_RadixForParallel.java` and > `DualPivotQuicksort_RadixForAll.java`? > > Would it not be sufficient to do the following two runs: > > 1. Baseline (Stock JDK) vs. AVX512 sort for` sort() `and `parallelSort()` ? > 2. AVX512 sort vs. Radix sort for `sort()` and `parallelSort()` ? > >> [1] current implementation in JDK [2] your AVX12 version based on [1], from >> this PR [3] my new version with Radix sort for parallel case plus your AVX12 >> changes [4] my new version with Radix sort for all cases plus your AVX12 >> changes > > Thanks, > Vamsi Hi @vamsi-parasa , @sviswa7 and @jatin-bhateja ! I agree with you that bot PRs are independent and radix sort discussion should not be here, my fault. I asked to run benchmarking because actual AVX512 changes are in C++ code and I don't have ready environment for testing. I'm moving this discussion to PR https://github.com/openjdk/jdk/pull/13568. Many thanks, Vladimir - PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1728996248
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 22:46:16 GMT, Srinivas Vamsi Parasa wrote: > Hi Vladimir, > > Just trying to understand: is there a reason to use > `DualPivotQuicksort_RadixForParallel.java` and > `DualPivotQuicksort_RadixForAll.java`? > > Would it not be sufficient to do the following two runs: > > 1. Baseline (Stock JDK) vs. AVX512 sort for`sort()`and `parallelSort()` ? > 2. AVX512 sort vs. Radix sort for `sort()` and `parallelSort()` ? > > > [1] current implementation in JDK [2] your AVX12 version based on [1], from > > this PR [3] my new version with Radix sort for parallel case plus your > > AVX12 changes [4] my new version with Radix sort for all cases plus your > > AVX12 changes > > Thanks, Vamsi Hi @vamsi-parasa , Please also add C1 intrinsification support for newly carved sort / partition intrinsics in your follow up PR https://bugs.openjdk.org/browse/JDK-8315382 , it may further improve the overall runtime for large sized array sort in tiered compilation mode. - PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1728877814
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* > Hi Vamsi, > > In this comment [#13568 > (comment)](https://github.com/openjdk/jdk/pull/13568#issuecomment-1728082819) > Paul suggested comparing of performance. > > Could you please run benchmarking of the following 4 class? > > [1] current implementation in JDK [2] your AVX12 version based on [1], from > this PR [3] my new version with Radix sort for parallel case plus your AVX12 > changes > https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_RadixForParallel.java > [4] my new version with Radix sort for all cases plus your AVX12 changes >
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* Hi Vladimir, Sure, will run the benchmarks and post the JMH performance data. Just trying to understand: is there a reason to use `DualPivotQuicksort_RadixForParallel.java` and `DualPivotQuicksort_RadixForAll.java`? Would it not be sufficient to do the following two runs: 1. Baseline (Stock JDK) vs. AVX512 sort for` sort() `and `parallelSort()` ? 2. AVX512 sort vs. Radix sort for `sort()` and `parallelSort(`) ? > [1] current implementation in JDK [2] your AVX12 version based on [1], from > this PR [3] my new version with Radix sort for parallel case plus your AVX12 >
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
On Wed, 20 Sep 2023 17:19:42 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> |Arrays.sort benchmark | Array Size | Baseline >> (us/op)| AVX512 Sort (us/op) | Speedup | >> |--- | --- | --- | --- | --- >> | >> |ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> |ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> |ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> |ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> |ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> |ArraysSort.doubleSort | 1000| 9.274 | 6.331 >> | 1.5 | >> |ArraysSort.doubleSort | 1 | 323.339 | 71.228 >> | **4.5** | >> |ArraysSort.doubleSort | 10 | 4471.871| >> 1002.748| **4.5** | >> |ArraysSort.doubleSort | 100 | 51660.742 | >> 12921.295 | **4.0** | >> |ArraysSort.floatSort| 10 | 0.045 | 0.046 >> | 1.0 | >> |ArraysSort.floatSort| 25 | 0.103 | 0.084 >> | 1.2 | >> |ArraysSort.floatSort| 50 | 0.285 | 0.33 >> | 0.9 | >> |ArraysSort.floatSort| 75 | 0.492 | 0.346 >> | 1.4 | >> |ArraysSort.floatSort| 100 | 0.597 | 0.326 >> | 1.8 | >> |ArraysSort.floatSort| 1000| 9.811 | 5.294 >> | 1.9 | >> |ArraysSort.floatSort| 1 | 323.955 | 50.547 >> | **6.4** | >> |ArraysSort.floatSort| 10 | 4326.38 | 731.152 >> | **5.9** | >> |ArraysSort.floatSort| 100 | 52413.88| >> 8409.193| **6.2** | >> |ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> |ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> |ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> |ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> |ArraysSort.intSort | 100 | 0.63| 0.521 >> | 1.2 | >> |ArraysSort.intSort | 1000| 10.518 | 4.698 >> | 2.2 | >> |ArraysSort.intSort | 1 | 309.659 | 42.518 >> | **7.3** | >> |ArraysSort.intSort | 10 | 4130.917| >> 573.956 | **7.2** | >> |ArraysSort.intSort | 100 | 49876.307 | >> 6712.812| **7.4** | >> |ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> |ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> |ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> |ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> |ArraysSort.longSort | 100 | 0.76| 0.58 >> | 1.3 | >> |ArraysSort.longSort | 1000| 10.449 | 6 > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > change variable names of indexPivot* to pivotIndex* Hi Vamsi, In this comment https://github.com/openjdk/jdk/pull/13568#issuecomment-1728082819 Paul suggested comparing of performance. Could you please run benchmarking of the following 4 class? [1] current implementation in JDK [2] your AVX12 version based on [1], from this PR [3] my new version with Radix sort for parallel case plus your AVX12 changes https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_RadixForParallel.java [4] my new version with Radix sort for all cases plus your AVX12 changes
Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v40]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX512 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto > ~4.5x improvement for 64-bit datatypes (long, double) as shown in the > performance data below. > > > **Arrays.sort performance data using JMH benchmarks for arrays with random > data** > > | Arrays.sort benchmark | Array Size | Baseline > (us/op)| AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- > | > | ArraysSort.doubleSort | 10 | 0.034 | 0.035 > | 1.0 | > | ArraysSort.doubleSort | 25 | 0.116 | 0.089 > | 1.3 | > | ArraysSort.doubleSort | 50 | 0.282 | 0.291 > | 1.0 | > | ArraysSort.doubleSort | 75 | 0.474 | 0.358 > | 1.3 | > | ArraysSort.doubleSort | 100 | 0.654 | 0.623 > | 1.0 | > | ArraysSort.doubleSort | 1000| 9.274 | 6.331 > | 1.5 | > | ArraysSort.doubleSort | 1 | 323.339 | 71.228 > | **4.5** | > | ArraysSort.doubleSort | 10 | 4471.871| > 1002.748| **4.5** | > | ArraysSort.doubleSort | 100 | 51660.742 | > 12921.295 | **4.0** | > | ArraysSort.floatSort| 10 | 0.045 | 0.046 > | 1.0 | > | ArraysSort.floatSort| 25 | 0.103 | 0.084 > | 1.2 | > | ArraysSort.floatSort| 50 | 0.285 | 0.33 > | 0.9 | > | ArraysSort.floatSort| 75 | 0.492 | 0.346 > | 1.4 | > | ArraysSort.floatSort| 100 | 0.597 | 0.326 > | 1.8 | > | ArraysSort.floatSort| 1000| 9.811 | 5.294 > | 1.9 | > | ArraysSort.floatSort| 1 | 323.955 | 50.547 > | **6.4** | > | ArraysSort.floatSort| 10 | 4326.38 | 731.152 > | **5.9** | > | ArraysSort.floatSort| 100 | 52413.88| > 8409.193| **6.2** | > | ArraysSort.intSort | 10 | 0.033 | 0.033 > | 1.0 | > | ArraysSort.intSort | 25 | 0.086 | 0.051 > | 1.7 | > | ArraysSort.intSort | 50 | 0.236 | 0.151 > | 1.6 | > | ArraysSort.intSort | 75 | 0.416 | 0.332 > | 1.3 | > | ArraysSort.intSort | 100 | 0.63| 0.521 > | 1.2 | > | ArraysSort.intSort | 1000| 10.518 | 4.698 > | 2.2 | > | ArraysSort.intSort | 1 | 309.659 | 42.518 > | **7.3** | > | ArraysSort.intSort | 10 | 4130.917| > 573.956 | **7.2** | > | ArraysSort.intSort | 100 | 49876.307 | > 6712.812| **7.4** | > | ArraysSort.longSort | 10 | 0.036 | 0.037 > | 1.0 | > | ArraysSort.longSort | 25 | 0.094 | 0.08 > | 1.2 | > | ArraysSort.longSort | 50 | 0.218 | 0.227 > | 1.0 | > | ArraysSort.longSort | 75 | 0.466 | 0.402 > | 1.2 | > | ArraysSort.longSort | 100 | 0.76| 0.58 > | 1.3 | > | ArraysSort.longSort | 1000| 10.449 | 6.239 > | 1.7 | > | ArraysSort.longSort | 1 | 307.074 | 70.284 > | **4.4** | > | ArraysSor... Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change variable names of indexPivot* to pivotIndex* - Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/3e0b8cfc..b04cb6c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=14227=39 - incr: https://webrevs.openjdk.org/?repo=jdk=14227=38-39 Stats: 49 lines in 1 file changed: 0 ins; 6 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227