On Sat, 29 Mar 2025 00:58:59 GMT, Vladimir Ivanov <[email protected]> wrote:
> Build and use SLEEF library as a backend implementation for Vector API
> trigonometric functions on macosx-aarch64 platform.
>
> It improves raw throughput and eliminates GC overhead of non-intrinsified
> Vector API operation.
>
> PR includes build changes and libsleef sources relocation from
> `src/jdk.incubator.vector/linux/native/` to
> `src/jdk.incubator.vector/share/native/`.
>
> Once libsleef library is present, existing code in
> `stubGenerator_aarch64.cpp` successfully links at JVM startup.
>
> Testing: hs-tier1 - hs-tier4, microbenchmarks
Microbenchmark results on Apple M1 Pro:
Benchmark | Throughput |
Allocation rate |
| Before After | Before
After |
======================|=======================================|===================================================|
Float128Vector.ACOS | 3.856 ±0.013 1.941 ± 0.008 us/op | 6076.461 ±
20.067 0.007 ±0.001 MB/sec |
Float128Vector.ASIN | 3.813 ±0.014 1.512 ± 0.017 us/op | 6145.040 ±
22.824 0.007 ±0.001 MB/sec |
Float128Vector.ATAN | 7.124 ±0.040 2.220 ± 0.003 us/op | 3289.059 ±
18.539 0.007 ±0.001 MB/sec |
Float128Vector.ATAN2 | 16.983 ±1.031 3.412 ± 0.038 us/op | 2075.808
±127.179 0.007 ±0.001 MB/sec |
Float128Vector.CBRT | 6.431 ±0.014 4.075 ± 0.011 us/op | 3643.789 ±
7.933 0.007 ±0.001 MB/sec |
Float128Vector.COS | 8.269 ±0.094 5.614 ± 0.026 us/op | 2833.915 ±
32.041 0.007 ±0.001 MB/sec |
Float128Vector.COSH | 5.779 ±0.020 3.072 ± 0.010 us/op | 4054.800 ±
14.028 0.007 ±0.001 MB/sec |
Float128Vector.EXP | 5.456 ±0.006 0.936 ± 0.004 us/op | 4294.853 ±
5.025 0.007 ±0.001 MB/sec |
Float128Vector.EXPM1 | 6.888 ±0.059 2.972 ± 0.010 us/op | 3402.363 ±
28.694 0.007 ±0.001 MB/sec |
Float128Vector.HYPOT | 6.369 ±0.013 2.213 ± 0.008 us/op | 5519.051 ±
11.103 0.007 ±0.001 MB/sec |
Float128Vector.LOG | 8.469 ±0.574 1.729 ± 0.004 us/op | 2775.039
±157.629 0.007 ±0.001 MB/sec |
Float128Vector.LOG10 | 15.235 ±1.039 1.830 ± 0.006 us/op | 1544.009
±107.436 0.007 ±0.001 MB/sec |
Float128Vector.LOG1P | 8.823 ±0.040 1.745 ± 0.014 us/op | 2655.757 ±
11.964 0.007 ±0.001 MB/sec |
Float128Vector.POW | 27.511 ±0.918 7.467 ± 0.033 us/op | 1278.693 ±
42.538 0.007 ±0.001 MB/sec |
Float128Vector.SIN | 7.846 ±0.063 5.822 ± 0.015 us/op | 2986.480 ±
24.025 0.007 ±0.001 MB/sec |
Float128Vector.SINH | 5.747 ±0.033 3.206 ± 0.034 us/op | 4077.645 ±
23.305 0.007 ±0.001 MB/sec |
Float128Vector.TAN | 22.337 ±0.533 6.114 ± 0.016 us/op | 1049.469 ±
24.969 0.007 ±0.001 MB/sec |
Double128Vector.ACOS | 5.789 ±0.107 4.635 ± 0.013 us/op | 8097.069
±146.593 0.007 ±0.001 MB/sec |
Double128Vector.ASIN | 5.655 ±0.011 3.858 ± 0.017 us/op | 8287.521 ±
16.023 0.007 ±0.001 MB/sec |
Double128Vector.ATAN | 10.082 ±0.046 6.016 ± 0.016 us/op | 4648.068 ±
21.401 0.007 ±0.001 MB/sec |
Double128Vector.ATAN2 | 17.286 ±0.113 8.148 ± 0.015 us/op | 4067.019 ±
26.586 0.007 ±0.001 MB/sec |
Double128Vector.CBRT | 9.779 ±0.048 8.861 ± 0.045 us/op | 4792.419 ±
23.381 0.007 ±0.001 MB/sec |
Double128Vector.COS | 9.071 ±0.107 6.948 ± 0.027 us/op | 5166.999 ±
59.377 0.007 ±0.001 MB/sec |
Double128Vector.COSH | 8.234 ±0.030 6.403 ± 0.025 us/op | 5692.144 ±
20.625 0.007 ±0.001 MB/sec |
Double128Vector.EXP | 7.506 ±0.012 3.073 ± 0.013 us/op | 6243.783 ±
10.382 0.007 ±0.001 MB/sec |
Double128Vector.EXPM1 | 9.122 ±0.036 6.122 ± 0.036 us/op | 5137.721 ±
20.350 0.007 ±0.001 MB/sec |
Double128Vector.HYPOT | 13.445 ±0.248 4.596 ± 0.035 us/op | 5229.977 ±
96.222 0.007 ±0.001 MB/sec |
Double128Vector.LOG | 10.396 ±0.042 4.629 ± 0.081 us/op | 4507.928 ±
18.101 0.007 ±0.001 MB/sec |
Double128Vector.LOG10 | 13.923 ±0.046 4.889 ± 0.021 us/op | 3365.944 ±
11.078 0.007 ±0.001 MB/sec |
Double128Vector.LOG1P | 12.336 ±0.045 5.010 ± 0.027 us/op | 3799.204 ±
13.816 0.007 ±0.001 MB/sec |
Double128Vector.POW | 28.852 ±0.043 15.270 ± 0.081 us/op | 2436.503 ±
3.647 0.007 ±0.001 MB/sec |
Double128Vector.SIN | 8.821 ±0.018 6.309 ± 0.037 us/op | 5313.077 ±
11.056 0.007 ±0.001 MB/sec |
Double128Vector.SINH | 8.289 ±0.037 6.566 ± 0.029 us/op | 5654.264 ±
25.538 0.007 ±0.001 MB/sec |
Double128Vector.TAN | 25.535 ±0.636 9.788 ± 0.036 us/op | 1836.177 ±
44.430 0.007 ±0.001 MB/sec |
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24306#issuecomment-2762959907