Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Srinivas Vamsi Parasa
On Tue, 5 Dec 2023 19:37:34 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains 17 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - add GCC version guards
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - Remove C++17 from C flags
>>  - add avoid masked stores operation
>>  - update the code to check for supported simd sort cpus
>>  - Disable AVX2 sort for 64-bit types
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - fix jcheck failures due to windows encoding
>>  - fix carriage return and change insertion sort thresholds
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d4151e5b...bc590d9f
>
> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64:
> 
>> 62: }
>> 63: return lut;
>> 64: }();
> 
> Lut64 is needed for compress64 emulation, can be removed.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234:
> 
>> 232: 
>> 233: vtype::mask_storeu(leftStore, left, temp);
>> 234: }
> 
> Can be removed if not being used.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277:
> 
>> 275: 
>> 276: return _mm_popcnt_u32(shortMask);
>> 277: }
> 
> Can be removed if not being used.

Removed in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44:
> 
>> 42: break;
>> 43: case JVM_T_FLOAT:
>> 44: avx2_fast_sort((float*)array, from_index, to_index, 
>> INSERTION_SORT_THRESHOLD_32BIT);
> 
> Assertions for unsupported types.

Added in the latest commit...

> src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56:
> 
>> 54: case JVM_T_FLOAT:
>> 55: avx2_fast_partition((float*)array, from_index, to_index, 
>> pivot_indices, index_pivot1, index_pivot2);
>> 56: break;
> 
> Please add assertion for unsupported types.

Added in the latest commit...

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701182
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702999
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702251
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701469
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701705


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Srinivas Vamsi Parasa
On Tue, 5 Dec 2023 19:33:48 GMT, Jatin Bhateja  wrote:

>> Srinivas Vamsi Parasa has updated the pull request with a new target base 
>> due to a merge or a rebase. The incremental webrev excludes the unrelated 
>> changes brought in by the merge/rebase. The pull request contains 17 
>> additional commits since the last revision:
>> 
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - add GCC version guards
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - Remove C++17 from C flags
>>  - add avoid masked stores operation
>>  - update the code to check for supported simd sort cpus
>>  - Disable AVX2 sort for 64-bit types
>>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>>  - fix jcheck failures due to windows encoding
>>  - fix carriage return and change insertion sort thresholds
>>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d8b29378...bc590d9f
>
> src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235:
> 
>> 233: return avx512_double_compressstore>(
>> 234: left_addr, right_addr, k, reg);
>> 235: }
> 
> Can be removed.

This is needed for AVX512 sort...

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417690992


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Magnus Ihse Bursie
On Tue, 5 Dec 2023 17:26:06 GMT, Srinivas Vamsi Parasa  wrote:

>> That sounds weird. You can't check for if compiler options should be enabled 
>> or not inside source code files.
>> 
>> Are you saying that when compiling with GCC 6, it will just silently ignore 
>> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
>> unknown or invalid option, if C++17 is not supported.
>
> Hi Magnus (@magicus),
>  
>> Are you saying that when compiling with GCC 6, it will just silently ignore 
>> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
>> unknown or invalid option, if C++17 is not supported.
> 
> The GCC complier for versions 6 (and even 5) silently ignores the flag 
> `-std=c++17`. It does not print any warning or error. I tested it with a toy 
> C++ program and also by building OpenJDK using GCC 6. 
> 
>> You can't check for if compiler options should be enabled or not inside 
>> source code files.
> 
>  what I meant was, there are #ifdef guards using predefined macros in the C++ 
> source code to check for GCC version and make the simdsort code available for 
> compilation or not based on the GCC version
> 
> 
> // src/java.base/linux/native/libsimdsort/simdsort-support.hpp
> #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == 
> 7) && (__GNUC_MINOR__ >= 5
> #define __SIMDSORT_SUPPORTED_LINUX
> #endif
> 
> 
> 
> //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp
> #include "simdsort-support.hpp"
> #ifdef __SIMDSORT_SUPPORTED_LINUX
> 
> #endif

Okay, then I guess I am fine with this.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417170882


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-06 Thread Magnus Ihse Bursie
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa  wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX2 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> For serial sort on random data, this PR shows upto ~7.5x improvement for 
>> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
>> performance data below.
>> 
>> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
>> datatypes (int, float) as shown below.
>> 
>> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
>> Speedup
>> -- | -- | -- | -- | --
>> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
>> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
>> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
>> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
>> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
>> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
>> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
>> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
>> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
>> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
>> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
>> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
>> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
>> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
>> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
>> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
>> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
>> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
>> 
>> 
>> 
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 17 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - add GCC version guards
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - Remove C++17 from C flags
>  - add avoid masked stores operation
>  - update the code to check for supported simd sort cpus
>  - Disable AVX2 sort for 64-bit types
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - fix jcheck failures due to windows encoding
>  - fix carriage return and change insertion sort thresholds
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/dbbc5f0a...bc590d9f

Build changes look fine. You will still need the usual 2 reviewers from hotspot.

-

Marked as reviewed by ihse (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16534#pullrequestreview-1767374468


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-05 Thread Jatin Bhateja
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa  wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX2 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> For serial sort on random data, this PR shows upto ~7.5x improvement for 
>> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
>> performance data below.
>> 
>> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
>> datatypes (int, float) as shown below.
>> 
>> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
>> Speedup
>> -- | -- | -- | -- | --
>> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
>> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
>> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
>> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
>> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
>> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
>> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
>> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
>> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
>> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
>> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
>> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
>> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
>> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
>> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
>> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
>> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
>> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
>> 
>> 
>> 
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 17 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - add GCC version guards
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - Remove C++17 from C flags
>  - add avoid masked stores operation
>  - update the code to check for supported simd sort cpus
>  - Disable AVX2 sort for 64-bit types
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - fix jcheck failures due to windows encoding
>  - fix carriage return and change insertion sort thresholds
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/021f5063...bc590d9f

src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64:

> 62: }
> 63: return lut;
> 64: }();

Lut64 is needed for compress64 emulation, can be removed.

src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234:

> 232: 
> 233: vtype::mask_storeu(leftStore, left, temp);
> 234: }

Can be removed if not being used.

src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277:

> 275: 
> 276: return _mm_popcnt_u32(shortMask);
> 277: }

Can be removed if not being used.

src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44:

> 42: break;
> 43: case JVM_T_FLOAT:
> 44: avx2_fast_sort((float*)array, from_index, to_index, 
> INSERTION_SORT_THRESHOLD_32BIT);

Assertions for unsupported types.

src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56:

> 54: case JVM_T_FLOAT:
> 55: avx2_fast_partition((float*)array, from_index, to_index, 
> pivot_indices, index_pivot1, index_pivot2);
> 56: break;

Please add assertion for unsupported types.

src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235:

> 233: return avx512_double_compressstore>(
> 234: left_addr, right_addr, k, reg);
> 235: }

Can be removed.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416191049
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186096
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186814
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189371
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189115
PR Review Comment: 

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-05 Thread Jatin Bhateja
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa  wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX2 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> For serial sort on random data, this PR shows upto ~7.5x improvement for 
>> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
>> performance data below.
>> 
>> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
>> datatypes (int, float) as shown below.
>> 
>> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
>> Speedup
>> -- | -- | -- | -- | --
>> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
>> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
>> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
>> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
>> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
>> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
>> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
>> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
>> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
>> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
>> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
>> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
>> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
>> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
>> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
>> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
>> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
>> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
>> 
>> 
>> 
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 17 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - add GCC version guards
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - Remove C++17 from C flags
>  - add avoid masked stores operation
>  - update the code to check for supported simd sort cpus
>  - Disable AVX2 sort for 64-bit types
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - fix jcheck failures due to windows encoding
>  - fix carriage return and change insertion sort thresholds
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/fc7d33f2...bc590d9f

src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50:

> 48: case JVM_T_DOUBLE:
> 49: avx2_fast_sort((double*)array, from_index, to_index, 
> INSERTION_SORT_THRESHOLD_64BIT);
> 50: break;

Please add safe assertions for missing types.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416173543


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-05 Thread Srinivas Vamsi Parasa
On Tue, 5 Dec 2023 11:19:00 GMT, Magnus Ihse Bursie  wrote:

>> Hi Marcus (@magicus), please see the updated code which added guards to 
>> check for GCC version >= 7.5 in 
>> `src/java.base/linux/native/libsimdsort/{avx2-linux-qsort.cpp, 
>> avx512-linux-qsort.cpp}`. GCC >= 7.5 is needed to compile libsimdsort using 
>> C++17 features. Made sure that OpenJDK builds without errors using both GCC 
>> 7.5 and GCC 6.4.
>
> That sounds weird. You can't check for if compiler options should be enabled 
> or not inside source code files.
> 
> Are you saying that when compiling with GCC 6, it will just silently ignore 
> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
> unknown or invalid option, if C++17 is not supported.

Hi Magnus (@magicus),
 
> Are you saying that when compiling with GCC 6, it will just silently ignore 
> `-std=c++17`? I'd have assumed that it printed a warning or error about an 
> unknown or invalid option, if C++17 is not supported.

The GCC complier for versions 6 (and even 5) silently ignores the flag 
`-std=c++17`. It does not print any warning or error. I tested it with a toy 
C++ program and also by building OpenJDK using GCC 6. 

> You can't check for if compiler options should be enabled or not inside 
> source code files.

 what I meant was, there are #ifdef guards using predefined macros in the C++ 
source code to check for GCC version and make the simdsort code available for 
compilation or not based on the GCC version


// src/java.base/linux/native/libsimdsort/simdsort-support.hpp
#if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == 7) 
&& (__GNUC_MINOR__ >= 5
#define __SIMDSORT_SUPPORTED_LINUX
#endif



//src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp
#include "simdsort-support.hpp"
#ifdef __SIMDSORT_SUPPORTED_LINUX

#endif

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416037340


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-05 Thread Magnus Ihse Bursie
On Mon, 4 Dec 2023 22:14:14 GMT, Srinivas Vamsi Parasa  wrote:

>> Then you must add logic to check for this. Now the build will just fail if 
>> building with an older gcc. That is not acceptable.
>
> Hi Marcus (@magicus), please see the updated code which added guards to check 
> for GCC version >= 7.5 in 
> `src/java.base/linux/native/libsimdsort/{avx2-linux-qsort.cpp, 
> avx512-linux-qsort.cpp}`. GCC >= 7.5 is needed to compile libsimdsort using 
> C++17 features. Made sure that OpenJDK builds without errors using both GCC 
> 7.5 and GCC 6.4.

That sounds weird. You can't check for if compiler options should be enabled or 
not inside source code files.

Are you saying that when compiling with GCC 6, it will just silently ignore 
`-std=c++17`? I'd have assumed that it printed a warning or error about an 
unknown or invalid option, if C++17 is not supported.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1415392615


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-04 Thread Srinivas Vamsi Parasa
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa  wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX2 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> For serial sort on random data, this PR shows upto ~7.5x improvement for 
>> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
>> performance data below.
>> 
>> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
>> datatypes (int, float) as shown below.
>> 
>> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
>> Speedup
>> -- | -- | -- | -- | --
>> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
>> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
>> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
>> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
>> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
>> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
>> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
>> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
>> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
>> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
>> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
>> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
>> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
>> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
>> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
>> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
>> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
>> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
>> 
>> 
>> 
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 17 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - add GCC version guards
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - Remove C++17 from C flags
>  - add avoid masked stores operation
>  - update the code to check for supported simd sort cpus
>  - Disable AVX2 sort for 64-bit types
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - fix jcheck failures due to windows encoding
>  - fix carriage return and change insertion sort thresholds
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/0dc47dcf...bc590d9f

Hello Vladimir (@vnkozlov),

Could you please review this PR?

Thanks,
Vamsi

-

PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1839810768


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-04 Thread Sandhya Viswanathan
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa  wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX2 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> For serial sort on random data, this PR shows upto ~7.5x improvement for 
>> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
>> performance data below.
>> 
>> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
>> datatypes (int, float) as shown below.
>> 
>> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
>> 
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
>> Speedup
>> -- | -- | -- | -- | --
>> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
>> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
>> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
>> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
>> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
>> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
>> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
>> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
>> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
>> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
>> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
>> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
>> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
>> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
>> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
>> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
>> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
>> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
>> 
>> 
>> 
>> 
>> 
>> > xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40;>
>> 
>> 
>> 
>> 
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 17 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - add GCC version guards
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - Remove C++17 from C flags
>  - add avoid masked stores operation
>  - update the code to check for supported simd sort cpus
>  - Disable AVX2 sort for 64-bit types
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - fix jcheck failures due to windows encoding
>  - fix carriage return and change insertion sort thresholds
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/d4804a12...bc590d9f

The PR looks good to me.

-

Marked as reviewed by sviswanathan (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/16534#pullrequestreview-1763680754


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-04 Thread Srinivas Vamsi Parasa
> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX2 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> For serial sort on random data, this PR shows upto ~7.5x improvement for 
> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
> performance data below.
> 
> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
> datatypes (int, float) as shown below.
> 
> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>  href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
> 
> 
> 
> 
> 
> 
> 
> 
> Benchmark (Serial Sort) | Size | Baseline  (us/op) | AVX2 (us/op) | 
> Speedup
> -- | -- | -- | -- | --
> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6
> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3
> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1
> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7
> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0
> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6
> 
> 
> 
> 
> 
>  xmlns:o="urn:schemas-microsoft-com:office:office"
> xmlns:x="urn:schemas-microsoft-com:office:excel"
> xmlns="http://www.w3.org/TR/REC-html40;>
> 
> 
> 
> 
> 
>  href="file:///C:/Users/...

Srinivas Vamsi Parasa has updated the pull request with a new target base due 
to a merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 17 additional commits 
since the last revision:

 - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
 - add GCC version guards
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
 - Remove C++17 from C flags
 - add avoid masked stores operation
 - update the code to check for supported simd sort cpus
 - Disable AVX2 sort for 64-bit types
 - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
 - fix jcheck failures due to windows encoding
 - fix carriage return and change insertion sort thresholds
 - ... and 7 more: https://git.openjdk.org/jdk/compare/0b17dc14...bc590d9f

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/16534/files
  - new: https://git.openjdk.org/jdk/pull/16534/files/bb5f711a..bc590d9f

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=16534=07
 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=06-07

  Stats: 36176 lines in 1033 files changed: 17425 ins; 14551 del; 4200 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534


Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

2023-12-04 Thread Srinivas Vamsi Parasa
On Mon, 4 Dec 2023 11:48:44 GMT, Magnus Ihse Bursie  wrote:

>>> But you are saying that you want to skip building this library unless you 
>>> have a gcc version that supports c++17?
>>> 
>> Yes, the request is to skip building the simdsort library if GCC version is 
>> < 8 as only GCC >= 8 supports C++17 features.
>
> Then you must add logic to check for this. Now the build will just fail if 
> building with an older gcc. That is not acceptable.

Hi Marcus (@magicus), please see the updated code which added guards to check 
for GCC version >= 7.5 in 
`src/java.base/linux/native/libsimdsort/{avx2-linux-qsort.cpp, 
avx512-linux-qsort.cpp}`. GCC >= 7.5 is needed to compile libsimdsort using 
C++17 features. Made sure that OpenJDK builds without errors using both GCC 7.5 
and GCC 6.4.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1414570644