Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v17]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v16]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v15]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v14]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v13]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v12]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11]
On Wed, 19 May 2021 22:16:18 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Wed, 19 May 2021 22:02:14 GMT, Paul Sandoz wrote: >> Tier 1 to 3 tests pass for the default set of build profiles. > >> Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. >> Paul and Vladimir, I have implemented your review comments. Please take a >> look. > > `case VECTOR_OP_OR` is still present. @PaulSandoz Thanks for pointing that out. I had missed git add for some of the files. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. > Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. > Paul and Vladimir, I have implemented your review comments. Please take a > look. `case VECTOR_OP_OR` is still present. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. Paul and Vladimir, I have implemented your review comments. Please take a look. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v10]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]
On Wed, 19 May 2021 03:37:11 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]
On Wed, 19 May 2021 03:37:11 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]
On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]
On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]
On Wed, 19 May 2021 00:26:48 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one >> additional commit since the last revision: >> >> jcheck fixes > > This is much much better! Thank you for changing it. I am only asking now to > add comment explaining names. @vnkozlov I have added comments explaining naming convention. Please let me know if this looks ok. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]
On Tue, 18 May 2021 23:59:28 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]
On Tue, 18 May 2021 23:43:13 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v5]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]
On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]
On Sat, 15 May 2021 02:06:29 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v3]
> This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vecto
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/m
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. @PaulSandoz Thanks a lot for running through the tests. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan wrote: >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> This work is part of second round of incubation of the Vector API. >> JEP: https://bugs.openjdk.java.net/browse/JDK-8261663 >> >> Please review. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector.COSH 17.34 113.34 ops/ms 6.54 >> Double64Vector.EXP 27.08 203.53 ops/ms 7.52 >> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 >> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 >> Double64Vector.LOG 26.75 142.63 ops/ms 5.33 >> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 >> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 >> Double64Vector.SIN 23.28 146.91 ops/ms 6.31 >> Double64Vector.SINH 17.62 88.59 ops/ms 5.03 >> Double64Vector.TAN 21.00 86.43 ops/ms 4.12 >> Double64Vector.TANH 23.75 111.35 ops/ms 4.69 >> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 >> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 >> Float128Vector.ATAN 22.52 318.74 ops/
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > This work is part of second round of incubation of the Vector API. > JEP: https://bugs.openjdk.java.net/browse/JDK-8261663 > > Please review. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.91 ops/ms 6.31 > Double64Vector.SINH 17.62 88.59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vect
RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics
Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. The following changes are made: The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. The assembly source files are named as “*.S” and include files are named as “*.S.inc”. The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. Changes are made to build system to support dependency tracking for assembly files with includes. The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bur...@oracle.com). This work is part of second round of incubation of the Vector API. JEP: https://bugs.openjdk.java.net/browse/JDK-8261663 Please review. Performance: Micro benchmark Base Optimized Unit Gain(Optimized/Base) Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 Double128Vector.COS 49.94 245.89 ops/ms 4.92 Double128Vector.COSH 26.91 126.00 ops/ms 4.68 Double128Vector.EXP 71.64 379.65 ops/ms 5.30 Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 Double128Vector.LOG 61.95 279.84 ops/ms 4.52 Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 Double128Vector.SIN 49.36 240.79 ops/ms 4.88 Double128Vector.SINH 26.59 103.75 ops/ms 3.90 Double128Vector.TAN 41.05 152.39 ops/ms 3.71 Double128Vector.TANH 45.29 169.53 ops/ms 3.74 Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 Double256Vector.COS 58.26 389.77 ops/ms 6.69 Double256Vector.COSH 29.44 151.11 ops/ms 5.13 Double256Vector.EXP 86.67 564.68 ops/ms 6.52 Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 Double256Vector.LOG 71.52 394.90 ops/ms 5.52 Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 Double256Vector.SIN 57.06 380.98 ops/ms 6.68 Double256Vector.SINH 29.40 117.37 ops/ms 3.99 Double256Vector.TAN 44.90 279.90 ops/ms 6.23 Double256Vector.TANH 54.08 274.71 ops/ms 5.08 Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 Double512Vector.COS 59.88 837.04 ops/ms 13.98 Double512Vector.COSH 30.34 172.76 ops/ms 5.70 Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 Double512Vector.LOG 74.84 996.00 ops/ms 13.31 Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 Double512Vector.POW 37.42 384.13 ops/ms 10.26 Double512Vector.SIN 59.74 728.45 ops/ms 12.19 Double512Vector.SINH 29.47 143.38 ops/ms 4.87 Double512Vector.TAN 46.20 587.21 ops/ms 12.71 Double512Vector.TANH 57.36 495.42 ops/ms 8.64 Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 Double64Vector.COS 23.42 152.01 ops/ms 6.49 Double64Vector.COSH 17.34 113.34 ops/ms 6.54 Double64Vector.EXP 27.08 203.53 ops/ms 7.52 Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 Double64Vector.LOG 26.75 142.63 ops/ms 5.33 Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 Double64Vector.SIN 23.28 146.91 ops/ms 6.31 Double64Vector.SINH 17.62 88.59 ops/ms 5.03 Double64Vector.TAN 21.00 86.43 ops/ms 4.12 Double64Vector.TANH 23.75 111.35 ops/ms 4.69 Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 Float128Vector.COS 42.82 803.02 ops/ms 18.75 Float128Vector.COSH 31.44 118.34 ops/ms 3.76 Float128Vector.EXP 72.43 855.33 ops/ms 11.81 Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 Float128Vector.LOG 52.95 877.94 ops/ms 16.5