Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

Jatin Bhateja Tue, 05 Dec 2023 11:49:36 -0800

On Mon, 4 Dec 2023 22:15:24 GMT, Srinivas Vamsi Parasa <[email protected]> wrote:


>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX2 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> For serial sort on random data, this PR shows upto ~7.5x improvement for 
>> 32-bit datatypes (int, float) on Intel TigerLake machine as shown in the 
>> performance data below.
>> 
>> For parallel sort on random data, this PR shows upto ~3.4x for 32-bit 
>> datatypes (int, float) as shown below.
>> 
>> **Note:** This PR also improves the performance of AVX512 sort by upto 35%.
>> 
>> <html xmlns:v="urn:schemas-microsoft-com:vml"
>> xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40";>
>> 
>> <head>
>> 
>> <meta name=ProgId content=Excel.Sheet>
>> <meta name=Generator content="Microsoft Excel 15">
>> <link id=Main-File rel=Main-File
>> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
>> <link rel=File-List
>> href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
>> 
>> 
>> </head>
>> 
>> <body link="#0563C1" vlink="#954F72">
>> 
>> 
>> 
>> Benchmark (Serial Sort) | Size | Baseline      (us/op) | AVX2     (us/op) | 
>> Speedup
>> -- | -- | -- | -- | --
>> ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
>> ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
>> ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
>> ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
>> ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
>> ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
>> ArraysSort.intSort | 10000 | 330.065 | 43.383 | 7.6
>> ArraysSort.intSort | 100000 | 4099.527 | 778.943 | 5.3
>> ArraysSort.intSort | 1000000 | 49150.16 | 9634.335 | 5.1
>> ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
>> ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
>> ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
>> ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
>> ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
>> ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
>> ArraysSort.floatSort | 10000 | 323.136 | 57.175 | 5.7
>> ArraysSort.floatSort | 100000 | 4299.519 | 862.63 | 5.0
>> ArraysSort.floatSort | 1000000 | 50889.4 | 10972.19 | 4.6
>> 
>> </body>
>> 
>> </html>
>> 
>> <html xmlns:v="urn:schemas-microsoft-com:vml"
>> xmlns:o="urn:schemas-microsoft-com:office:office"
>> xmlns:x="urn:schemas-microsoft-com:office:excel"
>> xmlns="http://www.w3.org/TR/REC-html40";>
>> 
>> <head>
>> 
>> <meta name=ProgId cont...
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 17 additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - add GCC version guards
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - Remove C++17 from C flags
>  - add avoid masked stores operation
>  - update the code to check for supported simd sort cpus
>  - Disable AVX2 sort for 64-bit types
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into simdsort
>  - fix jcheck failures due to windows encoding
>  - fix carriage return and change insertion sort thresholds
>  - ... and 7 more: https://git.openjdk.org/jdk/compare/021f5063...bc590d9f

src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 64:

> 62:     }
> 63:     return lut;
> 64: }();

Lut64 is needed for compress64 emulation, can be removed.

src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 234:

> 232: 
> 233:     vtype::mask_storeu(leftStore, left, temp);
> 234: }

Can be removed if not being used.

src/java.base/linux/native/libsimdsort/avx2-emu-funcs.hpp line 277:

> 275: 
> 276:     return _mm_popcnt_u32(shortMask);
> 277: }

Can be removed if not being used.

src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 44:

> 42:                 break;
> 43:             case JVM_T_FLOAT:
> 44:                 avx2_fast_sort((float*)array, from_index, to_index, 
> INSERTION_SORT_THRESHOLD_32BIT);

Assertions for unsupported types.

src/java.base/linux/native/libsimdsort/avx2-linux-qsort.cpp line 56:

> 54:             case JVM_T_FLOAT:
> 55:                 avx2_fast_partition((float*)array, from_index, to_index, 
> pivot_indices, index_pivot1, index_pivot2);
> 56:                 break;

Please add assertion for unsupported types.

src/java.base/linux/native/libsimdsort/avx512-32bit-qsort.hpp line 235:

> 233:         return avx512_double_compressstore<zmm_vector<type_t>>(
> 234:             left_addr, right_addr, k, reg);
> 235:     }

Can be removed.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416191049
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186096
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416186814
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189371
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416189115
PR Review Comment: https://git.openjdk.org/jdk/pull/16534#discussion_r1416187350

Re: RFR: 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, float arrays) [v8]

Reply via email to