The goal is to develop faster sort routines for x86_64 CPUs by taking advantage 
of AVX2 instructions. This enhancement provides an order of magnitude speedup 
for Arrays.sort() using int, long, float and double arrays.

For serial sort on random data, this PR shows upto ~7.5x improvement for 32-bit 
datatypes (int, float) and upto ~3x improvement for 64-bit datatypes (long, 
double) on Intel TigerLake machine as shown in the performance data below.

For parallel sort on random data, this PR shows upto ~3.4x for 32-bit datatypes 
(int, float) and upto ~2.3x for 64-bit datatypes as shown below.

**Note:** This PR also improves the performance of AVX512 sort by upto 35%.

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40";>

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">


</head>

<body link="#0563C1" vlink="#954F72">



Benchmark (Serial Sort) | Size | Baseline      (us/op) | AVX2     (us/op) | 
Speedup
-- | -- | -- | -- | --
ArraysSort.intSort | 10 | 0.034 | 0.029 | 1.2
ArraysSort.intSort | 25 | 0.088 | 0.044 | 2.0
ArraysSort.intSort | 50 | 0.239 | 0.159 | 1.5
ArraysSort.intSort | 75 | 0.417 | 0.27 | 1.5
ArraysSort.intSort | 100 | 0.572 | 0.265 | 2.2
ArraysSort.intSort | 1000 | 10.098 | 4.282 | 2.4
ArraysSort.intSort | 10000 | 330.065 | 43.383 | 7.6
ArraysSort.intSort | 100000 | 4099.527 | 778.943 | 5.3
ArraysSort.intSort | 1000000 | 49150.16 | 9634.335 | 5.1
ArraysSort.floatSort | 10 | 0.045 | 0.043 | 1.0
ArraysSort.floatSort | 25 | 0.105 | 0.073 | 1.4
ArraysSort.floatSort | 50 | 0.278 | 0.216 | 1.3
ArraysSort.floatSort | 75 | 0.476 | 0.241 | 2.0
ArraysSort.floatSort | 100 | 0.583 | 0.313 | 1.9
ArraysSort.floatSort | 1000 | 10.182 | 4.329 | 2.4
ArraysSort.floatSort | 10000 | 323.136 | 57.175 | 5.7
ArraysSort.floatSort | 100000 | 4299.519 | 862.63 | 5.0
ArraysSort.floatSort | 1000000 | 50889.4 | 10972.19 | 4.6
ArraysSort.longSort | 10 | 0.037 | 0.031 | 1.2
ArraysSort.longSort | 25 | 0.101 | 0.073 | 1.4
ArraysSort.longSort | 50 | 0.227 | 0.219 | 1.0
ArraysSort.longSort | 75 | 0.446 | 0.332 | 1.3
ArraysSort.longSort | 100 | 0.714 | 0.557 | 1.3
ArraysSort.longSort | 1000 | 10.716 | 8.624 | 1.2
ArraysSort.longSort | 10000 | 310.408 | 100.74 | 3.1
ArraysSort.longSort | 100000 | 4151.178 | 1653.963 | 2.5
ArraysSort.longSort | 1000000 | 48813.97 | 20900.69 | 2.3
ArraysSort.doubleSort | 10 | 0.035 | 0.036 | 1.0
ArraysSort.doubleSort | 25 | 0.115 | 0.076 | 1.5
ArraysSort.doubleSort | 50 | 0.274 | 0.214 | 1.3
ArraysSort.doubleSort | 75 | 0.417 | 0.323 | 1.3
ArraysSort.doubleSort | 100 | 0.667 | 0.473 | 1.4
ArraysSort.doubleSort | 1000 | 9.87 | 7.01 | 1.4
ArraysSort.doubleSort | 10000 | 327.84 | 104.979 | 3.1
ArraysSort.doubleSort | 100000 | 4568.306 | 1575.473 | 2.9
ArraysSort.doubleSort | 1000000 | 51173 | 19287.01 | 2.7

</body>

</html>

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40";>

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/sparasa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">



</head>

<body link="#0563C1" vlink="#954F72">



Benchmark    (Parallel Sort) | Size | Baseline      (us/op) | AVX2     (us/op) 
| Speedup
-- | -- | -- | -- | --
ArraysSort.intParallelSort | 10 | 0.033 | 0.029 | 1.14
ArraysSort.intParallelSort | 25 | 0.087 | 0.043 | 2.02
ArraysSort.intParallelSort | 50 | 0.25 | 0.161 | 1.55
ArraysSort.intParallelSort | 75 | 0.419 | 0.267 | 1.57
ArraysSort.intParallelSort | 100 | 0.555 | 0.262 | 2.12
ArraysSort.intParallelSort | 1000 | 8.287 | 4.259 | 1.95
ArraysSort.intParallelSort | 10000 | 412.983 | 138.082 | 2.99
ArraysSort.intParallelSort | 100000 | 857.161 | 250.89 | 3.42
ArraysSort.intParallelSort | 1000000 | 7961.746 | 3090.108 | 2.58
ArraysSort.floatParallelSort | 10 | 0.047 | 0.043 | 1.09
ArraysSort.floatParallelSort | 25 | 0.104 | 0.075 | 1.39
ArraysSort.floatParallelSort | 50 | 0.277 | 0.215 | 1.29
ArraysSort.floatParallelSort | 75 | 0.466 | 0.243 | 1.92
ArraysSort.floatParallelSort | 100 | 0.587 | 0.322 | 1.82
ArraysSort.floatParallelSort | 1000 | 9.503 | 4.349 | 2.19
ArraysSort.floatParallelSort | 10000 | 534.867 | 214.857 | 2.49
ArraysSort.floatParallelSort | 100000 | 1028.12 | 1022.735 | 1.01
ArraysSort.floatParallelSort | 1000000 | 9537.956 | 4622.919 | 2.06
ArraysSort.longParallelSort | 10 | 0.037 | 0.031 | 1.19
ArraysSort.longParallelSort | 25 | 0.101 | 0.073 | 1.38
ArraysSort.longParallelSort | 50 | 0.235 | 0.223 | 1.05
ArraysSort.longParallelSort | 75 | 0.464 | 0.336 | 1.38
ArraysSort.longParallelSort | 100 | 0.712 | 0.554 | 1.29
ArraysSort.longParallelSort | 1000 | 10.847 | 8.623 | 1.26
ArraysSort.longParallelSort | 10000 | 132.814 | 56.811 | 2.34
ArraysSort.longParallelSort | 100000 | 812.834 | 528.254 | 1.54
ArraysSort.longParallelSort | 1000000 | 8061.741 | 5149.765 | 1.57
ArraysSort.doubleParallelSort | 10 | 0.036 | 0.036 | 1.00
ArraysSort.doubleParallelSort | 25 | 0.117 | 0.081 | 1.44
ArraysSort.doubleParallelSort | 50 | 0.307 | 0.213 | 1.44
ArraysSort.doubleParallelSort | 75 | 0.433 | 0.322 | 1.34
ArraysSort.doubleParallelSort | 100 | 0.669 | 0.482 | 1.39
ArraysSort.doubleParallelSort | 1000 | 11.533 | 7.084 | 1.63
ArraysSort.doubleParallelSort | 10000 | 370.922 | 257.617 | 1.44
ArraysSort.doubleParallelSort | 100000 | 996.256 | 649.087 | 1.53
ArraysSort.doubleParallelSort | 1000000 | 9610.408 | 7131.308 | 1.35



</body>

</html>

-------------

Commit messages:
 - fix jcheck failures due to windows encoding
 - fix carriage return and change insertion sort thresholds
 - fix formatting and white spaces
 - cleanup unused code
 - fix insertion sort thresholds
 - add insertion sort
 - fix headers
 - revert to xss-common-qsort
 - 8319577: x86_64 AVX2 intrinsics for Arrays.sort methods (int, long, float 
and double arrays)

Changes: https://git.openjdk.org/jdk/pull/16534/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16534&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8319577
  Stats: 4158 lines in 16 files changed: 2578 ins; 1420 del; 160 mod
  Patch: https://git.openjdk.org/jdk/pull/16534.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/16534/head:pull/16534

PR: https://git.openjdk.org/jdk/pull/16534

Reply via email to