On Wed, 11 Oct 2023 20:58:23 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> wrote:
>> The goal of this PR is to address the follow-up comments to the SIMD >> accelerated sort PR (#14227) which implemented AVX512 intrinsics for >> Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A >> performance regression (due to micro-architectural differences) was reported >> for AMD Zen4 CPUs in the comments section of PR. >> 2) Addressing the build failure due to a bug in GCC 12 (which was fixed in >> version 12.3.1). The details of the bug are at: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593 >> 3) Minor changes in Javadoc strings > > Srinivas Vamsi Parasa has updated the pull request incrementally with one > additional commit since the last revision: > > Revert @ForceInline annotations for small array sort methods At least on Saphire Rapids the [emulation suggested here](https://github.com/natmaurice/x86-simd-sort/commit/41d03b2d8f3b62a2ee6a3a97a8da7f193a407026) only imposes a 6% penalty for `intSort`, while also mitigating the performance issue on Zen 4. Configuration summary: * Name: linux-x86_64-server-release * Debug level: release * HS debug level: product * JVM variants: server * JVM features: server: 'cds compiler1 compiler2 epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc' * OpenJDK target: OS: linux, CPU architecture: x86, address length: 64 * Version string: 22-internal-adhoc.nfsuper.jdk (22-internal) * Source date: 1697078366 (2023-10-12T02:39:26Z) Tools summary: * Boot JDK: openjdk version "21" 2023-09-19 OpenJDK Runtime Environment Zulu21.28+86-SA (build 21+35) OpenJDK 64-Bit Server VM Zulu21.28+86-SA (build 21+35, mixed mode, sharing) (at /usr/lib/jvm/zulu-21-amd64) * Toolchain: gcc (GNU Compiler Collection) * C Compiler: Version 11.4.0 (at /usr/bin/gcc) * C++ Compiler: Version 11.4.0 (at /usr/bin/g++) https://github.com/openjdk/jdk/compare/master...DanielThomas:jdk:dannyt/emulate-compressstoreu?expand=1 ## Intel(R) Xeon(R) Platinum 8488C - Current Benchmark (size) Mode Cnt Score Error Units ArraysSort.intSort 10 avgt 3 0.043 ? 0.006 us/op ArraysSort.intSort 25 avgt 3 0.082 ? 0.002 us/op ArraysSort.intSort 50 avgt 3 0.205 ? 0.022 us/op ArraysSort.intSort 75 avgt 3 0.394 ? 0.048 us/op ArraysSort.intSort 100 avgt 3 0.625 ? 0.003 us/op ArraysSort.intSort 1000 avgt 3 5.759 ? 1.111 us/op ArraysSort.intSort 10000 avgt 3 51.680 ? 3.568 us/op ArraysSort.intSort 100000 avgt 3 777.339 ? 25.809 us/op ArraysSort.intSort 1000000 avgt 3 8848.261 ? 954.475 us/op ## Intel(R) Xeon(R) Platinum 8488C - Emulated Benchmark (size) Mode Cnt Score Error Units ArraysSort.intSort 10 avgt 3 0.046 ? 0.002 us/op ArraysSort.intSort 25 avgt 3 0.083 ? 0.004 us/op ArraysSort.intSort 50 avgt 3 0.214 ? 0.022 us/op ArraysSort.intSort 75 avgt 3 0.411 ? 0.038 us/op ArraysSort.intSort 100 avgt 3 0.658 ? 0.022 us/op ArraysSort.intSort 1000 avgt 3 6.411 ? 0.497 us/op ArraysSort.intSort 10000 avgt 3 55.996 ? 3.155 us/op ArraysSort.intSort 100000 avgt 3 822.805 ? 40.223 us/op ArraysSort.intSort 1000000 avgt 3 9487.974 ? 216.146 us/op ## Intel(R) Xeon(R) Platinum 8488C - Baseline Benchmark (size) Mode Cnt Score Error Units ArraysSort.intSort 10 avgt 3 0.047 ? 0.006 us/op ArraysSort.intSort 25 avgt 3 0.099 ? 0.022 us/op ArraysSort.intSort 50 avgt 3 0.249 ? 0.024 us/op ArraysSort.intSort 75 avgt 3 0.438 ? 0.046 us/op ArraysSort.intSort 100 avgt 3 0.590 ? 0.079 us/op ArraysSort.intSort 1000 avgt 3 8.384 ? 1.852 us/op ArraysSort.intSort 10000 avgt 3 435.589 ? 23.647 us/op ArraysSort.intSort 100000 avgt 3 5380.658 ? 491.435 us/op ArraysSort.intSort 1000000 avgt 3 63857.189 ? 2746.106 us/op ## AMD EPYC 9R14 - Emulated $ make test TEST="micro:java.lang.ArraysSort.intSort" Benchmark (size) Mode Cnt Score Error Units ArraysSort.intSort 10 avgt 3 0.032 ? 0.001 us/op ArraysSort.intSort 25 avgt 3 0.067 ? 0.002 us/op ArraysSort.intSort 50 avgt 3 0.196 ? 0.002 us/op ArraysSort.intSort 75 avgt 3 0.429 ? 0.046 us/op ArraysSort.intSort 100 avgt 3 0.614 ? 0.025 us/op ArraysSort.intSort 1000 avgt 3 6.500 ? 0.084 us/op ArraysSort.intSort 10000 avgt 3 55.620 ? 0.943 us/op ArraysSort.intSort 100000 avgt 3 669.347 ? 75.432 us/op ArraysSort.intSort 1000000 avgt 3 9459.001 ? 201.298 us/op Finished running test 'micro:java.lang.ArraysSort.intSort' ## AMD EPYC 9R14 - Baseline $ make test TEST="micro:java.lang.ArraysSort.intSort" MICRO="VM_OPTIONS=-XX:UseAVX=2" Benchmark (size) Mode Cnt Score Error Units ArraysSort.intSort 10 avgt 3 0.035 ? 0.016 us/op ArraysSort.intSort 25 avgt 3 0.091 ? 0.009 us/op ArraysSort.intSort 50 avgt 3 0.245 ? 0.002 us/op ArraysSort.intSort 75 avgt 3 0.412 ? 0.004 us/op ArraysSort.intSort 100 avgt 3 0.531 ? 0.003 us/op ArraysSort.intSort 1000 avgt 3 8.803 ? 0.609 us/op ArraysSort.intSort 10000 avgt 3 254.413 ? 153.004 us/op ArraysSort.intSort 100000 avgt 3 4485.811 ? 17.517 us/op ArraysSort.intSort 1000000 avgt 3 56552.132 ? 3124.280 us/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/16124#issuecomment-1758865865