Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 19:37:34 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - add GCC version guards >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - Remove C++17 from C flags >> - add avoid masked stores operation >> - update the code to check for supported simd sort cpus >> - Disable AVX2 sort for 64-bit types >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - fix jcheck failures due to windows encoding >> - fix carriage return and change insertion sort thresholds >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d4151e5b...bc590d9f > > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64: > >> 62: } >> 63: return lut; >> 64: }(); > > Lut64 is needed for compress64 emulation, can be removed. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234: > >> 232: >> 233: vtype::mask_storeu(leftStore, left, temp); >> 234: } > > Can be removed if not being used. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277: > >> 275: >> 276: return _mm_popcnt_u32(shortMask); >> 277: } > > Can be removed if not being used. Removed in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44: > >> 42: break; >> 43: case JVM_T_FLOAT: >> 44: avx2_fast_sort((float*)array, from_index, to_index, >> INSERTION_SORT_THRESHOLD_32BIT); > > Assertions for unsupported types. Added in the latest commit... > src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56: > >> 54: case JVM_T_FLOAT: >> 55: avx2_fast_partition((float*)array, from_index, to_index, >> pivot_indices, index_pivot1, index_pivot2); >> 56: break; > > Please add assertion for unsupported types. Added in the latest commit... - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701182 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702999 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417702251 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701469 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417701705
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 19:33:48 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - add GCC version guards >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - Remove C++17 from C flags >> - add avoid masked stores operation >> - update the code to check for supported simd sort cpus >> - Disable AVX2 sort for 64-bit types >> - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort >> - fix jcheck failures due to windows encoding >> - fix carriage return and change insertion sort thresholds >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d8b29378...bc590d9f > > src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235: > >> 233: return avx512_double_compressstore>( >> 234: left_addr, right_addr, k, reg); >> 235: } > > Can be removed. This is needed for AVX512 sort... - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417690992
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 17:26:06 GMT, Srinivas Vamsi Parasa wrote: >> That sounds weird. You can't check for if compiler options should be enabled >> or not inside source code files. >> >> Are you saying that when compiling with GCC 6, it will just silently ignore >> `-std=c++17`? I'd have assumed that it printed a warning or error about an >> unknown or invalid option, if C++17 is not supported. > > Hi Magnus (@magicus), > >> Are you saying that when compiling with GCC 6, it will just silently ignore >> `-std=c++17`? I'd have assumed that it printed a warning or error about an >> unknown or invalid option, if C++17 is not supported. > > The GCC complier for versions 6 (and even 5) silently ignores the flag > `-std=c++17`. It does not print any warning or error. I tested it with a toy > C++ program and also by building OpenJDK using GCC 6. > >> You can't check for if compiler options should be enabled or not inside >> source code files. > > what I meant was, there are #ifdef guards using predefined macros in the C++ > source code to check for GCC version and make the simdsort code available for > compilation or not based on the GCC version > > > // src/java.base/linux/native/libsimdsort/simdsort-support.hpp > #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == > 7) && (__GNUC_MINOR__ >= 5 > #define __SIMDSORT_SUPPORTED_LINUX > #endif > > > > //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp > #include "simdsort-support.hpp" > #ifdef __SIMDSORT_SUPPORTED_LINUX > > #endif Okay, then I guess I am fine with this. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1417170882
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the >> performance data below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 >> >> >> >> >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - add GCC version guards > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - Remove C++17 from C flags > - add avoid masked stores operation > - update the code to check for supported simd sort cpus > - Disable AVX2 sort for 64-bit types > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - fix jcheck failures due to windows encoding > - fix carriage return and change insertion sort thresholds > - ... and 7 more: https://git.openjdk.org/jdk/compare/dbbc5f0a...bc590d9f Build changes look fine. You will still need the usual 2 reviewers from hotspot. - Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16534#pullrequestreview-1767374468
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the >> performance data below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 >> >> >> >> >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - add GCC version guards > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - Remove C++17 from C flags > - add avoid masked stores operation > - update the code to check for supported simd sort cpus > - Disable AVX2 sort for 64-bit types > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - fix jcheck failures due to windows encoding > - fix carriage return and change insertion sort thresholds > - ... and 7 more: https://git.openjdk.org/jdk/compare/021f5063...bc590d9f src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64: > 62: } > 63: return lut; > 64: }(); Lut64 is needed for compress64 emulation, can be removed. src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234: > 232: > 233: vtype::mask_storeu(leftStore, left, temp); > 234: } Can be removed if not being used. src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277: > 275: > 276: return _mm_popcnt_u32(shortMask); > 277: } Can be removed if not being used. src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44: > 42: break; > 43: case JVM_T_FLOAT: > 44: avx2_fast_sort((float*)array, from_index, to_index, > INSERTION_SORT_THRESHOLD_32BIT); Assertions for unsupported types. src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56: > 54: case JVM_T_FLOAT: > 55: avx2_fast_partition((float*)array, from_index, to_index, > pivot_indices, index_pivot1, index_pivot2); > 56: break; Please add assertion for unsupported types. src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235: > 233: return avx512_double_compressstore>( > 234: left_addr, right_addr, k, reg); > 235: } Can be removed. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416191049 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186096 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186814 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189371 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189115 PR Review Comment:
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the >> performance data below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 >> >> >> >> >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - add GCC version guards > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - Remove C++17 from C flags > - add avoid masked stores operation > - update the code to check for supported simd sort cpus > - Disable AVX2 sort for 64-bit types > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - fix jcheck failures due to windows encoding > - fix carriage return and change insertion sort thresholds > - ... and 7 more: https://git.openjdk.org/jdk/compare/fc7d33f2...bc590d9f src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 50: > 48: case JVM_T_DOUBLE: > 49: avx2_fast_sort((double*)array, from_index, to_index, > INSERTION_SORT_THRESHOLD_64BIT); > 50: break; Please add safe assertions for missing types. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416173543
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Tue, 5 Dec 2023 11:19:00 GMT, Magnus Ihse Bursie wrote: >> Hi Marcus (@magicus), please see the updated code which added guards to >> check for GCC version >= 7.5 in >> `src/java.base/linux/native/libsimdsort/{avx2-linux-qsort.cpp, >> avx512-linux-qsort.cpp}`. GCC >= 7.5 is needed to compile libsimdsort using >> C++17 features. Made sure that OpenJDK builds without errors using both GCC >> 7.5 and GCC 6.4. > > That sounds weird. You can't check for if compiler options should be enabled > or not inside source code files. > > Are you saying that when compiling with GCC 6, it will just silently ignore > `-std=c++17`? I'd have assumed that it printed a warning or error about an > unknown or invalid option, if C++17 is not supported. Hi Magnus (@magicus), > Are you saying that when compiling with GCC 6, it will just silently ignore > `-std=c++17`? I'd have assumed that it printed a warning or error about an > unknown or invalid option, if C++17 is not supported. The GCC complier for versions 6 (and even 5) silently ignores the flag `-std=c++17`. It does not print any warning or error. I tested it with a toy C++ program and also by building OpenJDK using GCC 6. > You can't check for if compiler options should be enabled or not inside > source code files. what I meant was, there are #ifdef guards using predefined macros in the C++ source code to check for GCC version and make the simdsort code available for compilation or not based on the GCC version // src/java.base/linux/native/libsimdsort/simdsort-support.hpp #if defined(_LP64) && (defined(__GNUC__) && ((__GNUC__ > 7) || ((__GNUC__ == 7) && (__GNUC_MINOR__ >= 5 #define __SIMDSORT_SUPPORTED_LINUX #endif //src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp #include "simdsort-support.hpp" #ifdef __SIMDSORT_SUPPORTED_LINUX #endif - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416037340
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 22:14:14 GMT, Srinivas Vamsi Parasa wrote: >> Then you must add logic to check for this. Now the build will just fail if >> building with an older gcc. That is not acceptable. > > Hi Marcus (@magicus), please see the updated code which added guards to check > for GCC version >= 7.5 in > `src/java.base/linux/native/libsimdsort/{avx2-linux-qsort.cpp, > avx512-linux-qsort.cpp}`. GCC >= 7.5 is needed to compile libsimdsort using > C++17 features. Made sure that OpenJDK builds without errors using both GCC > 7.5 and GCC 6.4. That sounds weird. You can't check for if compiler options should be enabled or not inside source code files. Are you saying that when compiling with GCC 6, it will just silently ignore `-std=c++17`? I'd have assumed that it printed a warning or error about an unknown or invalid option, if C++17 is not supported. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1415392615
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the >> performance data below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 >> >> >> >> >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - add GCC version guards > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - Remove C++17 from C flags > - add avoid masked stores operation > - update the code to check for supported simd sort cpus > - Disable AVX2 sort for 64-bit types > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - fix jcheck failures due to windows encoding > - fix carriage return and change insertion sort thresholds > - ... and 7 more: https://git.openjdk.org/jdk/compare/0dc47dcf...bc590d9f Hello Vladimir (@vnkozlov), Could you please review this PR? Thanks, Vamsi - PR Comment: https://git.openjdk.org/jdk/pull/16534#issuecomment-1839810768
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the >> performance data below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> >> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> >> >> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 >> >> >> >> >> >> > xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40;> >> >> >> >> > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - add GCC version guards > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - Remove C++17 from C flags > - add avoid masked stores operation > - update the code to check for supported simd sort cpus > - Disable AVX2 sort for 64-bit types > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - fix jcheck failures due to windows encoding > - fix carriage return and change insertion sort thresholds > - ... and 7 more: https://git.openjdk.org/jdk/compare/d4804a12...bc590d9f The PR looks good to me. - Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16534#pullrequestreview-1763680754
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
> The goal is to develop faster sort routines for x86_64 CPUs by taking > advantage of AVX2 instructions. This enhancement provides an order of > magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > For serial sort on random data, this PR shows upto ~7.5x improvement for > 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the > performance data below. > > For parallel sort on random data, this PR shows upto ~3.4x for 32-bit > datatypes (int, float) as shown below. > > **Note:** This PR also improves the performance of AVX512 sort by upto 35%. > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> > href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> > > > > > > > > > Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | > Speedup > -- | -- | -- | -- | -- > ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 > ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 > ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 > ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 > ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 > ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 > ArraysSort.intSort | 1 | 330.065 | 43.383 | 7.6 > ArraysSort.intSort | 10 | 4099.527 | 778.943 | 5.3 > ArraysSort.intSort | 100 | 49150.16 | 9634.335 | 5.1 > ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 > ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 > ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 > ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 > ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 > ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 > ArraysSort.floatSort | 1 | 323.136 | 57.175 | 5.7 > ArraysSort.floatSort | 10 | 4299.519 | 862.63 | 5.0 > ArraysSort.floatSort | 100 | 50889.4 | 10972.19 | 4.6 > > > > > > xmlns:o="urn:schemas-microsoft-com:office:office" > xmlns:x="urn:schemas-microsoft-com:office:excel" > xmlns="http://www.w3.org/TR/REC-html40;> > > > > > > href="file:///C:/Users/... Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort - add GCC version guards - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort - Remove C++17 from C flags - add avoid masked stores operation - update the code to check for supported simd sort cpus - Disable AVX2 sort for 64-bit types - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort - fix jcheck failures due to windows encoding - fix carriage return and change insertion sort thresholds - ... and 7 more: https://git.openjdk.org/jdk/compare/0b17dc14...bc590d9f - Changes: - all: https://git.openjdk.org/jdk/pull/16534/files - new: https://git.openjdk.org/jdk/pull/16534/files/bb5f711a..bc590d9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=16534=07 - incr: https://webrevs.openjdk.org/?repo=jdk=16534=06-07 Stats: 36176 lines in 1033 files changed: 17425 ins; 14551 del; 4200 mod Patch: https://git.openjdk.org/jdk/pull/16534.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534 PR: https://git.openjdk.org/jdk/pull/16534
Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]
On Mon, 4 Dec 2023 11:48:44 GMT, Magnus Ihse Bursie wrote: >>> But you are saying that you want to skip building this library unless you >>> have a gcc version that supports c++17? >>> >> Yes, the request is to skip building the simdsort library if GCC version is >> < 8 as only GCC >= 8 supports C++17 features. > > Then you must add logic to check for this. Now the build will just fail if > building with an older gcc. That is not acceptable. Hi Marcus (@magicus), please see the updated code which added guards to check for GCC version >= 7.5 in `src/java.base/linux/native/libsimdsort/{avx2-linux-qsort.cpp, avx512-linux-qsort.cpp}`. GCC >= 7.5 is needed to compile libsimdsort using C++17 features. Made sure that OpenJDK builds without errors using both GCC 7.5 and GCC 6.4. - PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1414570644