On Fri, 25 Aug 2023 18:59:47 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> wrote:
>>> Improvements are nice but it would not pay off if you have big regressions. >>> I can accept 0.9x but 0.4x - 0.8x regressions should be investigated and >>> implementation adjusted to avoid them. >> >> Hello Vladimir (@vnkozlov) , >> >> As per your suggestion, the implementation was adjusted to address the >> regressions caused for STAGGER and REPEATED type of input data patterns. >> Please see below the new JMH performance data using the adjusted >> implementation. >> >> In the new implementation, we don't call the AVX512 sort intrinsic at the >> top level (`Arrays.sort()`) . Instead, we take a decomposed or a 'building >> blocks' approach where we only intrinsify (using AVX512 instructions) the >> partitioning and small array sort functions used in the current baseline ( >> `DualPivotQuickSort.sort()` ). Since the current baseline has logic to >> detect and sort special input patterns like STAGGER, we fallback to the >> current baseline instead of using AVX512 partitioning and sorting (which >> works best for RANDOM, REPEATED and SHUFFLE patterns). >> >> Data below shows `Arrays.sort()` performance comparison between the current >> **Java baseline (DPQS)** vs. **AVX512 sort** (this PR) using the >> `ArraysSort.java` JMH >> [benchmark](https://github.com/openjdk/jdk/pull/13568/files#diff-dee51b13bd1872ff455cec2f29255cfd25014a5dd33dda55a2fc68638c3dd4b2) >> provided in the PR for [JDK-8266431: Dual-Pivot Quicksort improvements >> (Radix sort)](https://github.com/openjdk/jdk/pull/13568/files#top) ( #13568) >> >> - The following command line was used to run the benchmarks: ` java -jar >> $JDK_HOME/build/linux-x86_64-server-release/images/test/micro/benchmarks.jar >> -jvmArgs "-XX:CompileThreshold=1 -XX:-TieredCompilation" ArraysSort` >> - The scores shown are the average time (us/op), thus lower is better. The >> last column towards the right shows the speedup. >> >> >> | Benchmark | Mode | Size | Baseline DPQS >> (us/op) | AVX512 partitioning & sort (us/op) | Speedup | >> | --- | --- | --- | --- | --- >> | --- | >> | ArraysSort.Double.testSort | RANDOM | 800 | >> 6.7 | 4.8 | 1.39x | >> | ArraysSort.Double.testSort | RANDOM | 7000 | >> 234.1 | 51.5 | **4.55x** | >> | ArraysSort.Double.testSort | RANDOM | 50000 | >> 2155.9 | 470.0 | **4.59x** | >> | ArraysSort.Double.testSort | RANDOM | 300000 | >> 15076.3 | 3391.3 | **4.45x** | >> | ArraysSort.Double.testSort | RANDOM | 2000000 | >> 116445.5 | 27491.7 | **4.24x** | >> | ArraysSort.Double.testSort | REPEATED | 800 >> | 2.3 | 1.7 | 1.35x | >> | ArraysSort.Double.testSort | REPEATED | 7000 >> | 23.3 | 12... > >> @vamsi-parasa I submitted our testing of latest v28 version. It found issue >> in `ArraysSort.java` new benchmark file. You missed `,`after year in >> copyright line: >> >> ``` >> * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. >> ``` > > Thank you, Vladimir! Hi @vamsi-parasa , I did some runs on linux by disabling intrinsification of _arraySort and _arrayPartition since these are not handled for windows and non-x86 targets, just to measure the impact of java side code re-factoring. ![image](https://github.com/openjdk/jdk/assets/59989778/2866e684-955d-4902-af61-e08ac3f548d9) ![image](https://github.com/openjdk/jdk/assets/59989778/0a5e3265-b821-4a43-874e-6f1501423a07) ![image](https://github.com/openjdk/jdk/assets/59989778/f192f259-36e6-4e3e-8082-5cb0c05c9f10) Please find the results, may I request you to kindly verify performance number on windows to be sure that there are no performance degradation. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1696209731