Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v17]

2021-06-03 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v16]

2021-06-02 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v15]

2021-05-25 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v14]

2021-05-25 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v13]

2021-05-19 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v12]

2021-05-19 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11]

2021-05-19 Thread Paul Sandoz
On Wed, 19 May 2021 22:16:18 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11]

2021-05-19 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-19 Thread Sandhya Viswanathan
On Wed, 19 May 2021 22:02:14 GMT, Paul Sandoz  wrote:

>> Tier 1 to 3 tests pass for the default set of build profiles.
>
>> Thanks a lot for the review @PaulSandoz @iwanowww @erikj79.
>> Paul and Vladimir, I have implemented your review comments. Please take a 
>> look.
> 
> `case VECTOR_OP_OR` is still present.

@PaulSandoz Thanks for pointing that out. I had missed git add for some of the 
files.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-19 Thread Paul Sandoz
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz  wrote:

>> Sandhya Viswanathan has updated the pull request with a new target base due 
>> to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge master
>>  - remove whitespace
>>  - Merge master
>>  - Small fix
>>  - cleanup
>>  - x86 short vector math optimization for Vector API
>
> Tier 1 to 3 tests pass for the default set of build profiles.

> Thanks a lot for the review @PaulSandoz @iwanowww @erikj79.
> Paul and Vladimir, I have implemented your review comments. Please take a 
> look.

`case VECTOR_OP_OR` is still present.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-19 Thread Sandhya Viswanathan
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz  wrote:

>> Sandhya Viswanathan has updated the pull request with a new target base due 
>> to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge master
>>  - remove whitespace
>>  - Merge master
>>  - Small fix
>>  - cleanup
>>  - x86 short vector math optimization for Vector API
>
> Tier 1 to 3 tests pass for the default set of build profiles.

Thanks a lot for the review @PaulSandoz @iwanowww @erikj79.
Paul and Vladimir, I have implemented your review comments. Please take a look.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v10]

2021-05-19 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]

2021-05-19 Thread Paul Sandoz
On Wed, 19 May 2021 03:37:11 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]

2021-05-19 Thread Vladimir Ivanov
On Wed, 19 May 2021 03:37:11 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]

2021-05-18 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]

2021-05-18 Thread Sandhya Viswanathan
On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]

2021-05-18 Thread Vladimir Kozlov
On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]

2021-05-18 Thread Sandhya Viswanathan
On Wed, 19 May 2021 00:26:48 GMT, Vladimir Kozlov  wrote:

>> Sandhya Viswanathan has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   jcheck fixes
>
> This is much much better! Thank you for changing it. I am only asking now to 
> add comment explaining names.

@vnkozlov I have added comments explaining naming convention. Please let me 
know if this looks ok.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]

2021-05-18 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]

2021-05-18 Thread Vladimir Kozlov
On Tue, 18 May 2021 23:59:28 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]

2021-05-18 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]

2021-05-18 Thread Sandhya Viswanathan
On Tue, 18 May 2021 23:43:13 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]

2021-05-18 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v5]

2021-05-18 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]

2021-05-17 Thread Vladimir Kozlov
On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]

2021-05-17 Thread Erik Joelsson
On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]

2021-05-14 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v3]

2021-05-14 Thread Sandhya Viswanathan
> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vecto

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-04 Thread Sandhya Viswanathan
On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/m

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-03 Thread Sandhya Viswanathan
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz  wrote:

>> Sandhya Viswanathan has updated the pull request with a new target base due 
>> to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge master
>>  - remove whitespace
>>  - Merge master
>>  - Small fix
>>  - cleanup
>>  - x86 short vector math optimization for Vector API
>
> Tier 1 to 3 tests pass for the default set of build profiles.

@PaulSandoz Thanks a lot for running through the tests.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-03 Thread Paul Sandoz
On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan 
 wrote:

>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> This work is part of second round of incubation of the Vector API.
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8261663
>> 
>> Please review.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
>> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
>> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
>> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
>> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
>> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
>> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
>> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
>> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
>> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
>> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
>> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
>> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
>> Float128Vector.ATAN 22.52 318.74 ops/

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-04-28 Thread Sandhya Viswanathan
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> This work is part of second round of incubation of the Vector API.
> JEP: https://bugs.openjdk.java.net/browse/JDK-8261663
> 
> Please review.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.91 ops/ms 6.31
> Double64Vector.SINH 17.62 88.59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vect

RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics

2021-04-22 Thread Sandhya Viswanathan
Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly 
provide optimized implementation for Vector API transcendental and 
trigonometric methods.
These methods are built into a separate library instead of being part of 
libjvm.so or jvm.dll.

The following changes are made:
   The source for these methods is placed in the jdk.incubator.vector module 
under src/jdk.incubator.vector/linux/native/libsvml and 
src/jdk.incubator.vector/windows/native/libsvml.
   The assembly source files are named as “*.S” and include files are named as 
“*.S.inc”.
   The corresponding build script is placed at 
make/modules/jdk.incubator.vector/Lib.gmk.
   Changes are made to build system to support dependency tracking for assembly 
files with includes.
   The built native libraries (libsvml.so/svml.dll) are placed in bin directory 
of JDK on Windows and lib directory of JDK on Linux.
   The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
optimized methods from this library.

Build system changes and module library build scripts are contributed by Magnus 
(magnus.ihse.bur...@oracle.com).

This work is part of second round of incubation of the Vector API.
JEP: https://bugs.openjdk.java.net/browse/JDK-8261663

Please review.

Performance:
Micro benchmark Base Optimized Unit Gain(Optimized/Base)
Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
Double128Vector.COS 49.94 245.89 ops/ms 4.92
Double128Vector.COSH 26.91 126.00 ops/ms 4.68
Double128Vector.EXP 71.64 379.65 ops/ms 5.30
Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
Double128Vector.LOG 61.95 279.84 ops/ms 4.52
Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
Double128Vector.SIN 49.36 240.79 ops/ms 4.88
Double128Vector.SINH 26.59 103.75 ops/ms 3.90
Double128Vector.TAN 41.05 152.39 ops/ms 3.71
Double128Vector.TANH 45.29 169.53 ops/ms 3.74
Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
Double256Vector.COS 58.26 389.77 ops/ms 6.69
Double256Vector.COSH 29.44 151.11 ops/ms 5.13
Double256Vector.EXP 86.67 564.68 ops/ms 6.52
Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
Double256Vector.LOG 71.52 394.90 ops/ms 5.52
Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
Double256Vector.SIN 57.06 380.98 ops/ms 6.68
Double256Vector.SINH 29.40 117.37 ops/ms 3.99
Double256Vector.TAN 44.90 279.90 ops/ms 6.23
Double256Vector.TANH 54.08 274.71 ops/ms 5.08
Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
Double512Vector.COS 59.88 837.04 ops/ms 13.98
Double512Vector.COSH 30.34 172.76 ops/ms 5.70
Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
Double512Vector.LOG 74.84 996.00 ops/ms 13.31
Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
Double512Vector.POW 37.42 384.13 ops/ms 10.26
Double512Vector.SIN 59.74 728.45 ops/ms 12.19
Double512Vector.SINH 29.47 143.38 ops/ms 4.87
Double512Vector.TAN 46.20 587.21 ops/ms 12.71
Double512Vector.TANH 57.36 495.42 ops/ms 8.64
Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
Double64Vector.COS 23.42 152.01 ops/ms 6.49
Double64Vector.COSH 17.34 113.34 ops/ms 6.54
Double64Vector.EXP 27.08 203.53 ops/ms 7.52
Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
Double64Vector.LOG 26.75 142.63 ops/ms 5.33
Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
Double64Vector.SIN 23.28 146.91 ops/ms 6.31
Double64Vector.SINH 17.62 88.59 ops/ms 5.03
Double64Vector.TAN 21.00 86.43 ops/ms 4.12
Double64Vector.TANH 23.75 111.35 ops/ms 4.69
Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
Float128Vector.COS 42.82 803.02 ops/ms 18.75
Float128Vector.COSH 31.44 118.34 ops/ms 3.76
Float128Vector.EXP 72.43 855.33 ops/ms 11.81
Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
Float128Vector.LOG 52.95 877.94 ops/ms 16.5