On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> wrote:
>> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX2 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >> >> For serial sort on random data, this PR shows upto ~7.5x improvement for >> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the >> performance data below. >> >> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit >> datatypes (int, float) as shown below. >> >> **Note:** This PR also improves the performance of AVX512 sort by upto 35%. >> >> <html xmlns:v="urn:schemas-microsoft-com:vml" >> xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40"> >> >> <head> >> >> <meta name=ProgId content=Excel.Sheet> >> <meta name=Generator content="Microsoft Excel 15"> >> <link id=Main-File rel=Main-File >> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> >> <link rel=File-List >> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> >> >> >> </head> >> >> <body link="#0563C1" vlink="#954F72"> >> >> >> >> Benchmark (Serial Sort) | Size | Baseline (us/op) | AVX2 (us/op) | >> Speedup >> -- | -- | -- | -- | -- >> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2 >> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0 >> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5 >> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5 >> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2 >> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4 >> ArraysSort.intSort | 10000 | 330.065 | 43.383 | 7.6 >> ArraysSort.intSort | 100000 | 4099.527 | 778.943 | 5.3 >> ArraysSort.intSort | 1000000 | 49150.16 | 9634.335 | 5.1 >> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0 >> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4 >> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3 >> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0 >> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9 >> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4 >> ArraysSort.floatSort | 10000 | 323.136 | 57.175 | 5.7 >> ArraysSort.floatSort | 100000 | 4299.519 | 862.63 | 5.0 >> ArraysSort.floatSort | 1000000 | 50889.4 | 10972.19 | 4.6 >> >> </body> >> >> </html> >> >> <html xmlns:v="urn:schemas-microsoft-com:vml" >> xmlns:o="urn:schemas-microsoft-com:office:office" >> xmlns:x="urn:schemas-microsoft-com:office:excel" >> xmlns="http://www.w3.org/TR/REC-html40"> >> >> <head> >> >> <meta name=ProgId cont... > > Srinivas Vamsi Parasa has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - add GCC version guards > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - Remove C++17 from C flags > - add avoid masked stores operation > - update the code to check for supported simd sort cpus > - Disable AVX2 sort for 64-bit types > - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort > - fix jcheck failures due to windows encoding > - fix carriage return and change insertion sort thresholds > - ... and 7 more: https://git.openjdk.org/jdk/compare/021f5063...bc590d9f src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64: > 62: } > 63: return lut; > 64: }(); Lut64 is needed for compress64 emulation, can be removed. src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234: > 232: > 233: vtype::mask_storeu(leftStore, left, temp); > 234: } Can be removed if not being used. src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277: > 275: > 276: return _mm_popcnt_u32(shortMask); > 277: } Can be removed if not being used. src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44: > 42: break; > 43: case JVM_T_FLOAT: > 44: avx2_fast_sort((float*)array, from_index, to_index, > INSERTION_SORT_THRESHOLD_32BIT); Assertions for unsupported types. src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56: > 54: case JVM_T_FLOAT: > 55: avx2_fast_partition((float*)array, from_index, to_index, > pivot_indices, index_pivot1, index_pivot2); > 56: break; Please add assertion for unsupported types. src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235: > 233: return avx512_double_compressstore<zmm_vector<type_t>>( > 234: left_addr, right_addr, k, reg); > 235: } Can be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416191049 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186096 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186814 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189371 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189115 PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416187350