Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

Erik Joelsson Mon, 28 Aug 2023 16:49:57 -0700

On Mon, 28 Aug 2023 21:27:25 GMT, Srinivas Vamsi Parasa <[email protected]> 
wrote:


>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |    Arrays.sort benchmark   |       Array Size      |       Baseline 
>> (us/op)        |       AVX512 Sort (us/op)     |       Speedup |
>> |    ---     |       ---     |       ---     |       ---     |       ---     
>> |
>> |    ArraysSort.doubleSort   |       10      |       0.034   |       0.035   
>> |       1.0     |
>> |    ArraysSort.doubleSort   |       25      |       0.116   |       0.089   
>> |       1.3     |
>> |    ArraysSort.doubleSort   |       50      |       0.282   |       0.291   
>> |       1.0     |
>> |    ArraysSort.doubleSort   |       75      |       0.474   |       0.358   
>> |       1.3     |
>> |    ArraysSort.doubleSort   |       100     |       0.654   |       0.623   
>> |       1.0     |
>> |    ArraysSort.doubleSort   |       1000    |       9.274   |       6.331   
>> |       1.5     |
>> |    ArraysSort.doubleSort   |       10000   |       323.339 |       71.228  
>> |       **4.5** |
>> |    ArraysSort.doubleSort   |       100000  |       4471.871        |       
>> 1002.748        |       **4.5** |
>> |    ArraysSort.doubleSort   |       1000000 |       51660.742       |       
>> 12921.295       |       **4.0** |
>> |    ArraysSort.floatSort    |       10      |       0.045   |       0.046   
>> |       1.0     |
>> |    ArraysSort.floatSort    |       25      |       0.103   |       0.084   
>> |       1.2     |
>> |    ArraysSort.floatSort    |       50      |       0.285   |       0.33    
>> |       0.9     |
>> |    ArraysSort.floatSort    |       75      |       0.492   |       0.346   
>> |       1.4     |
>> |    ArraysSort.floatSort    |       100     |       0.597   |       0.326   
>> |       1.8     |
>> |    ArraysSort.floatSort    |       1000    |       9.811   |       5.294   
>> |       1.9     |
>> |    ArraysSort.floatSort    |       10000   |       323.955 |       50.547  
>> |       **6.4** |
>> |    ArraysSort.floatSort    |       100000  |       4326.38 |       731.152 
>> |       **5.9** |
>> |    ArraysSort.floatSort    |       1000000 |       52413.88        |       
>> 8409.193        |       **6.2** |
>> |    ArraysSort.intSort      |       10      |       0.033   |       0.033   
>> |       1.0     |
>> |    ArraysSort.intSort      |       25      |       0.086   |       0.051   
>> |       1.7     |
>> |    ArraysSort.intSort      |       50      |       0.236   |       0.151   
>> |       1.6     |
>> |    ArraysSort.intSort      |       75      |       0.416   |       0.332   
>> |       1.3     |
>> |    ArraysSort.intSort      |       100     |       0.63    |       0.521   
>> |       1.2     |
>> |    ArraysSort.intSort      |       1000    |       10.518  |       4.698   
>> |       2.2     |
>> |    ArraysSort.intSort      |       10000   |       309.659 |       42.518  
>> |       **7.3** |
>> |    ArraysSort.intSort      |       100000  |       4130.917        |       
>> 573.956 |       **7.2** |
>> |    ArraysSort.intSort      |       1000000 |       49876.307       |       
>> 6712.812        |       **7.4** |
>> |    ArraysSort.longSort     |       10      |       0.036   |       0.037   
>> |       1.0     |
>> |    ArraysSort.longSort     |       25      |       0.094   |       0.08    
>> |       1.2     |
>> |    ArraysSort.longSort     |       50      |       0.218   |       0.227   
>> |       1.0     |
>> |    ArraysSort.longSort     |       75      |       0.466   |       0.402   
>> |       1.2     |
>> |    ArraysSort.longSort     |       100     |       0.76    |       0.58    
>> |       1.3     |
>> |    ArraysSort.longSort     |       1000    |       10.449  |       6....
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Clean up parameters passed to arrayPartition; update the check to load 
> library

make/modules/java.base/Lib.gmk line 240:

> 238: 
> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, 
> x86_64)+$(INCLUDE_COMPILER2), true+true+true)
> 240:   $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \

As this is a C++ lib, consider using g++ for linking by setting:

TOOLCHAIN := TOOLCHAIN_LINK_CXX

make/modules/java.base/Lib.gmk line 241:

> 239: ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, 
> x86_64)+$(INCLUDE_COMPILER2), true+true+true)
> 240:   $(eval $(call SetupJdkLibrary, BUILD_LIB_X86_64, \
> 241:       NAME := x86_64, \

This looks like a rather generic name for a library. I would expect something a 
bit more descriptive. 

I also noted that @vnkozlov questioned needing a separate library for this and 
I didn't really find an answer. What do we gain from separating this into a 
separate dynamic library?

make/modules/java.base/Lib.gmk line 247:

> 245:       LDFLAGS := $(LDFLAGS_JDKLIB) \
> 246:           $(call SET_SHARED_LIBRARY_ORIGIN), \
> 247:       LDFLAGS_linux := -Wl$(COMMA)--no-as-needed, \

This is set by default since JDK-8314554.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308053768
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308051118
PR Review Comment: https://git.openjdk.org/jdk/pull/14227#discussion_r1308052384

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v30]

Reply via email to