On Thu, 31 Aug 2023 18:45:39 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and >> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the >> performance data below. >> >> >> **Arrays.sort performance data using JMH benchmarks for arrays with random >> data** >> >> | Arrays.sort benchmark | Array Size | Baseline >> (us/op) | AVX512 Sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- >> | >> | ArraysSort.doubleSort | 10 | 0.034 | 0.035 >> | 1.0 | >> | ArraysSort.doubleSort | 25 | 0.116 | 0.089 >> | 1.3 | >> | ArraysSort.doubleSort | 50 | 0.282 | 0.291 >> | 1.0 | >> | ArraysSort.doubleSort | 75 | 0.474 | 0.358 >> | 1.3 | >> | ArraysSort.doubleSort | 100 | 0.654 | 0.623 >> | 1.0 | >> | ArraysSort.doubleSort | 1000 | 9.274 | 6.331 >> | 1.5 | >> | ArraysSort.doubleSort | 10000 | 323.339 | 71.228 >> | **4.5** | >> | ArraysSort.doubleSort | 100000 | 4471.871 | >> 1002.748 | **4.5** | >> | ArraysSort.doubleSort | 1000000 | 51660.742 | >> 12921.295 | **4.0** | >> | ArraysSort.floatSort | 10 | 0.045 | 0.046 >> | 1.0 | >> | ArraysSort.floatSort | 25 | 0.103 | 0.084 >> | 1.2 | >> | ArraysSort.floatSort | 50 | 0.285 | 0.33 >> | 0.9 | >> | ArraysSort.floatSort | 75 | 0.492 | 0.346 >> | 1.4 | >> | ArraysSort.floatSort | 100 | 0.597 | 0.326 >> | 1.8 | >> | ArraysSort.floatSort | 1000 | 9.811 | 5.294 >> | 1.9 | >> | ArraysSort.floatSort | 10000 | 323.955 | 50.547 >> | **6.4** | >> | ArraysSort.floatSort | 100000 | 4326.38 | 731.152 >> | **5.9** | >> | ArraysSort.floatSort | 1000000 | 52413.88 | >> 8409.193 | **6.2** | >> | ArraysSort.intSort | 10 | 0.033 | 0.033 >> | 1.0 | >> | ArraysSort.intSort | 25 | 0.086 | 0.051 >> | 1.7 | >> | ArraysSort.intSort | 50 | 0.236 | 0.151 >> | 1.6 | >> | ArraysSort.intSort | 75 | 0.416 | 0.332 >> | 1.3 | >> | ArraysSort.intSort | 100 | 0.63 | 0.521 >> | 1.2 | >> | ArraysSort.intSort | 1000 | 10.518 | 4.698 >> | 2.2 | >> | ArraysSort.intSort | 10000 | 309.659 | 42.518 >> | **7.3** | >> | ArraysSort.intSort | 100000 | 4130.917 | >> 573.956 | **7.2** | >> | ArraysSort.intSort | 1000000 | 49876.307 | >> 6712.812 | **7.4** | >> | ArraysSort.longSort | 10 | 0.036 | 0.037 >> | 1.0 | >> | ArraysSort.longSort | 25 | 0.094 | 0.08 >> | 1.2 | >> | ArraysSort.longSort | 50 | 0.218 | 0.227 >> | 1.0 | >> | ArraysSort.longSort | 75 | 0.466 | 0.402 >> | 1.2 | >> | ArraysSort.longSort | 100 | 0.76 | 0.58 >> | 1.3 | >> | ArraysSort.longSort | 1000 | 10.449 | 6.... > > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The pull request now contains 32 commits: > > - update build script > - Merge branch 'master' of https://git.openjdk.org/jdk into avx512sort > - Clean up parameters passed to arrayPartition; update the check to load > library > - Remove unnecessary import in Arrays.java > - Move sort and partition intrinsics from Arrays.java to DPQS.java > - Fix unused assignment in DPQS.java and space in Arrays.java > - add parallelSort benchmarking > - Update copyright for DPQS.java; replace avx512 pivot calculation with > scalar version > - Update avx512-common-qsort.h > - Decomposed DPQS using AVX512 partitioning and AVX512 sort (for small > arrays). Works for serial and parallel sort. > - ... and 22 more: https://git.openjdk.org/jdk/compare/c8acab1d...1746eedd make/modules/java.base/Lib.gmk line 255: > 253: TARGETS += $(BUILD_LIB_X86_64) > 254: endif > 255: endif Indentation looks off here (https://openjdk.org/groups/build/doc/code-conventions.html) Suggestion: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, x86_64)+$(INCLUDE_COMPILER2), true+true+true) ifeq ($(TOOLCHAIN_TYPE), gcc) $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \ NAME := x86_64, \ TOOLCHAIN := TOOLCHAIN_LINK_CXX, \ OPTIMIZATION := HIGH, \ CFLAGS := $(CFLAGS_JDKLIB), \ CXXFLAGS := $(CXXFLAGS_JDKLIB), \ LDFLAGS := $(LDFLAGS_JDKLIB) \ $(call SET_SHARED_LIBRARY_ORIGIN), \ LIBS := $(LIBCXX), \ LIBS_linux := -lc -lm -ldl, \ )) TARGETS += $(BUILD_LIB_X86_64) endif endif I'm also still wondering about the library name. It's very generic for something that seems to be rather specific. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1312139279