Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa
> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request incrementally with one 
additional commit since the last revision:

  remove unused avx2 64 bit sort functions; add assertions

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/bc590d9f..c143e0b9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=08
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=07-08

  Stats: 128 lines in 4 files changed: 12 ins; 116 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa
On Tue, 5 Dec 2023 19:19:23 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   remove unused avx2 64 bit sort functions; add assertions
>
> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50:
> 
>> 48: case JVM_T_DOUBLE:
>> 49: avx2_fast_sort((double*)array, from_index, to_index, 
>> INSERTION_SORT_THRESHOLD_64BIT);
>> 50: break;
> 
> Please add safe assertions for missing types.

This is from an older (but outdated) commit. The assertions have been added in 
other cases.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417706670


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v9]

2023-12-06 Thread Srinivas Vamsi Parasa
On Wed, 6 Dec 2023 11:59:19 GMT, Magnus Ihse Bursie  wrote:

>> Hi Magnus (@magicus),
>>  
>>> Are you saying that when compiling with GCC 6, it will just silently ignore 
>>> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
>>> unknown or invalid option, if C++17 is not supported.
>> 
>> The GCC complier for versions 6 (and even 5) silently ignores the flag 
>> `-std=c++17`. It does not print any warning or error. I tested it with a toy 
>> C++ program and also by building OpenJDK using GCC 6. 
>> 
>>> You can't check for if compiler options should be enabled or not inside 
>>> source code files.
>> 
>>  what I meant was, there are #ifdef guards using predefined macros in the 
>> C++ source code to check for GCC version and make the simdsort code 
>> available for compilation or not based on the GCC version
>> 
>> 
>> // src/java.base/linux/native/libsimdsort/simdsort-support.hpp
>> #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == 
>> 7) && (__GNUC_MINOR__ >= 5
>> #define __SIMDSORT_SUPPORTED_LINUX
>> #endif
>> 
>> 
>> 
>> //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp
>> #include "simdsort-support.hpp"
>> #ifdef __SIMDSORT_SUPPORTED_LINUX
>> 
>> #endif
>
> Okay, then I guess I am fine with this.

Thank you Magnus!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417707661