On Wed, 11 Oct 2023 20:58:23 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> 
wrote:

>> The goal of this PR is to address the follow-up comments to the SIMD 
>> accelerated sort PR (#14227) which implemented AVX512 intrinsics for 
>> Arrays.sort() methods.
>> The proposed changes are:
>> 
>> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A 
>> performance regression (due to micro-architectural differences) was reported 
>> for AMD Zen4 CPUs in the comments section of PR.
>> 2) Addressing the build failure due to a bug in GCC 12 (which was fixed in 
>> version 12.3.1). The details of the bug are at: 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
>> 3) Minor changes in Javadoc strings
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Revert @ForceInline annotations for small array sort methods

At least on Saphire Rapids the [emulation suggested 
here](https://github.com/natmaurice/x86-simd-sort/commit/41d03b2d8f3b62a2ee6a3a97a8da7f193a407026)
 only imposes a 6% penalty for `intSort`, while also mitigating the performance 
issue on Zen 4.


Configuration summary:
* Name:           linux-x86_64-server-release
* Debug level:    release
* HS debug level: product
* JVM variants:   server
* JVM features:   server: 'cds compiler1 compiler2 epsilongc g1gc jfr jni-check 
jvmci jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: linux, CPU architecture: x86, address length: 64
* Version string: 22-internal-adhoc.nfsuper.jdk (22-internal)
* Source date:    1697078366 (2023-10-12T02:39:26Z)

Tools summary:
* Boot JDK:       openjdk version "21" 2023-09-19 OpenJDK Runtime Environment 
Zulu21.28+86-SA (build 21+35) OpenJDK 64-Bit Server VM Zulu21.28+86-SA (build 
21+35, mixed mode, sharing) (at /usr/lib/jvm/zulu-21-amd64)
* Toolchain:      gcc (GNU Compiler Collection)
* C Compiler:     Version 11.4.0 (at /usr/bin/gcc)
* C++ Compiler:   Version 11.4.0 (at /usr/bin/g++)


https://github.com/openjdk/jdk/compare/master...DanielThomas:jdk:dannyt/emulate-compressstoreu?expand=1

## Intel(R) Xeon(R) Platinum 8488C - Current


Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.043 ?   0.006  us/op
ArraysSort.intSort       25  avgt    3     0.082 ?   0.002  us/op
ArraysSort.intSort       50  avgt    3     0.205 ?   0.022  us/op
ArraysSort.intSort       75  avgt    3     0.394 ?   0.048  us/op
ArraysSort.intSort      100  avgt    3     0.625 ?   0.003  us/op
ArraysSort.intSort     1000  avgt    3     5.759 ?   1.111  us/op
ArraysSort.intSort    10000  avgt    3    51.680 ?   3.568  us/op
ArraysSort.intSort   100000  avgt    3   777.339 ?  25.809  us/op
ArraysSort.intSort  1000000  avgt    3  8848.261 ? 954.475  us/op


## Intel(R) Xeon(R) Platinum 8488C - Emulated


Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.046 ?   0.002  us/op
ArraysSort.intSort       25  avgt    3     0.083 ?   0.004  us/op
ArraysSort.intSort       50  avgt    3     0.214 ?   0.022  us/op
ArraysSort.intSort       75  avgt    3     0.411 ?   0.038  us/op
ArraysSort.intSort      100  avgt    3     0.658 ?   0.022  us/op
ArraysSort.intSort     1000  avgt    3     6.411 ?   0.497  us/op
ArraysSort.intSort    10000  avgt    3    55.996 ?   3.155  us/op
ArraysSort.intSort   100000  avgt    3   822.805 ?  40.223  us/op
ArraysSort.intSort  1000000  avgt    3  9487.974 ? 216.146  us/op


## Intel(R) Xeon(R) Platinum 8488C - Baseline


Benchmark            (size)  Mode  Cnt      Score      Error  Units
ArraysSort.intSort       10  avgt    3      0.047 ?    0.006  us/op
ArraysSort.intSort       25  avgt    3      0.099 ?    0.022  us/op
ArraysSort.intSort       50  avgt    3      0.249 ?    0.024  us/op
ArraysSort.intSort       75  avgt    3      0.438 ?    0.046  us/op
ArraysSort.intSort      100  avgt    3      0.590 ?    0.079  us/op
ArraysSort.intSort     1000  avgt    3      8.384 ?    1.852  us/op
ArraysSort.intSort    10000  avgt    3    435.589 ?   23.647  us/op
ArraysSort.intSort   100000  avgt    3   5380.658 ?  491.435  us/op
ArraysSort.intSort  1000000  avgt    3  63857.189 ? 2746.106  us/op


## AMD EPYC 9R14 - Emulated


$ make test TEST="micro:java.lang.ArraysSort.intSort"

Benchmark            (size)  Mode  Cnt     Score     Error  Units
ArraysSort.intSort       10  avgt    3     0.032 ?   0.001  us/op
ArraysSort.intSort       25  avgt    3     0.067 ?   0.002  us/op
ArraysSort.intSort       50  avgt    3     0.196 ?   0.002  us/op
ArraysSort.intSort       75  avgt    3     0.429 ?   0.046  us/op
ArraysSort.intSort      100  avgt    3     0.614 ?   0.025  us/op
ArraysSort.intSort     1000  avgt    3     6.500 ?   0.084  us/op
ArraysSort.intSort    10000  avgt    3    55.620 ?   0.943  us/op
ArraysSort.intSort   100000  avgt    3   669.347 ?  75.432  us/op
ArraysSort.intSort  1000000  avgt    3  9459.001 ? 201.298  us/op
Finished running test 'micro:java.lang.ArraysSort.intSort'


## AMD EPYC 9R14 - Baseline


$ make test TEST="micro:java.lang.ArraysSort.intSort" 
MICRO="VM_OPTIONS=-XX:UseAVX=2"

Benchmark            (size)  Mode  Cnt      Score      Error  Units
ArraysSort.intSort       10  avgt    3      0.035 ?    0.016  us/op
ArraysSort.intSort       25  avgt    3      0.091 ?    0.009  us/op
ArraysSort.intSort       50  avgt    3      0.245 ?    0.002  us/op
ArraysSort.intSort       75  avgt    3      0.412 ?    0.004  us/op
ArraysSort.intSort      100  avgt    3      0.531 ?    0.003  us/op
ArraysSort.intSort     1000  avgt    3      8.803 ?    0.609  us/op
ArraysSort.intSort    10000  avgt    3    254.413 ?  153.004  us/op
ArraysSort.intSort   100000  avgt    3   4485.811 ?   17.517  us/op
ArraysSort.intSort  1000000  avgt    3  56552.132 ? 3124.280  us/op

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16124#issuecomment-1758865865

Reply via email to