Re: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2]
On Fri, 5 Nov 2021 00:56:05 GMT, Vladimir Kozlov wrote: >> Thanks a lot @vnkozlov. > > @sviswa7 testing passed you can integrate. Thanks a lot @vnkozlov for testing and review. Thanks @erikj79 @PaulSandoz @magicus for the review. - PR: https://git.openjdk.java.net/jdk/pull/6265
Integrated: 8276025: Hotspot's libsvml.so may conflict with user dependency
On Thu, 4 Nov 2021 17:48:56 GMT, Sandhya Viswanathan wrote: > This patch removes conflicts with libsvml.so distributed with Intel's MKL > library: > Renames exported symbols from __svml to __jsvml. > Renames library from libsvml.so to libjsvml.so. > Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and > the renamed symbols. > Updates tests to look for the new library. > > Please review. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 9ad4d3d0 Author:Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/9ad4d3d06bb356436d69af07726ef6727c500f59 Stats: 391989 lines in 123 files changed: 192755 ins; 192755 del; 6479 mod 8276025: Hotspot's libsvml.so may conflict with user dependency Reviewed-by: kvn, erikj, psandoz, ihse - PR: https://git.openjdk.java.net/jdk/pull/6265
Re: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2]
> This patch removes conflicts with libsvml.so distributed with Intel's MKL > library: > Renames exported symbols from __svml to __jsvml. > Renames library from libsvml.so to libjsvml.so. > Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and > the renamed symbols. > Updates tests to look for the new library. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: change filename to jsvml* - Changes: - all: https://git.openjdk.java.net/jdk/pull/6265/files - new: https://git.openjdk.java.net/jdk/pull/6265/files/7c488c10..70d962ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=00-01 Stats: 0 lines in 72 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6265.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6265/head:pull/6265 PR: https://git.openjdk.java.net/jdk/pull/6265
Re: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency
On Thu, 4 Nov 2021 18:11:41 GMT, Vladimir Kozlov wrote: >> This patch removes conflicts with libsvml.so distributed with Intel's MKL >> library: >> Renames exported symbols from __svml to __jsvml. >> Renames library from libsvml.so to libjsvml.so. >> Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and >> the renamed symbols. >> Updates tests to look for the new library. >> >> Please review. >> >> Best Regards, >> Sandhya > > Looks good. I will run tests. Thanks a lot @vnkozlov. - PR: https://git.openjdk.java.net/jdk/pull/6265
RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency
This patch removes conflicts with libsvml.so distributed with Intel's MKL library: Renames exported symbols from __svml to __jsvml. Renames library from libsvml.so to libjsvml.so. Updates the stubGenerator_x86_64.cpp accordingly to load libjsvml.so and the renamed symbols. Updates tests to look for the new library. Please review. Best Regards, Sandhya - Commit messages: - load svml test fixes - update tests - 8276025: Hotspot's libsvml.so may conflict with user dependency Changes: https://git.openjdk.java.net/jdk/pull/6265/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276025 Stats: 391989 lines in 123 files changed: 192755 ins; 192755 del; 6479 mod Patch: https://git.openjdk.java.net/jdk/pull/6265.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6265/head:pull/6265 PR: https://git.openjdk.java.net/jdk/pull/6265
Re: RFR: 8273459: Update code segment alignment to 64 bytes [v4]
On Tue, 28 Sep 2021 17:31:24 GMT, Scott Gibbons wrote: >> Change the default code entry alignment to 64 bytes from 32 bytes. This >> allows for maintaining proper 64-byte alignment of data within a code >> segment, which is required by several AVX-512 instructions. >> >> I ran into this while implementing Base64 encoding and decoding. Code >> segments which were allocated with the address mod 32 == 0 but with the >> address mod 64 != 0 would cause the align() macro to misalign. This is >> because the align macro aligns to the size of the code segment and not the >> offset of the PC. So align(64) would align the PC to a multiple of 64 bytes >> from the start of the segment, and not to a pure 64-byte boundary as >> requested. Changing the alignment of the segment to 64 bytes fixes the >> issue. >> >> I have not seen any measurable difference in either performance or memory >> usage with the tests I have run. >> >> See [this >> ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) >> article for the discussion thread. > > Scott Gibbons has updated the pull request incrementally with two additional > commits since the last revision: > > - Merge branch 'asgibbons-align-fix' of https://github.com/asgibbons/jdk > into asgibbons-align-fix > - Revert .gitignore. Add comments and assert in align(). The updated patch looks good to me as well. - Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5547
Re: RFR: 8273459: Update code segment alignment to 64 bytes [v4]
On Tue, 28 Sep 2021 17:31:24 GMT, Scott Gibbons wrote: >> Change the default code entry alignment to 64 bytes from 32 bytes. This >> allows for maintaining proper 64-byte alignment of data within a code >> segment, which is required by several AVX-512 instructions. >> >> I ran into this while implementing Base64 encoding and decoding. Code >> segments which were allocated with the address mod 32 == 0 but with the >> address mod 64 != 0 would cause the align() macro to misalign. This is >> because the align macro aligns to the size of the code segment and not the >> offset of the PC. So align(64) would align the PC to a multiple of 64 bytes >> from the start of the segment, and not to a pure 64-byte boundary as >> requested. Changing the alignment of the segment to 64 bytes fixes the >> issue. >> >> I have not seen any measurable difference in either performance or memory >> usage with the tests I have run. >> >> See [this >> ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) >> article for the discussion thread. > > Scott Gibbons has updated the pull request incrementally with two additional > commits since the last revision: > > - Merge branch 'asgibbons-align-fix' of https://github.com/asgibbons/jdk > into asgibbons-align-fix > - Revert .gitignore. Add comments and assert in align(). Looks good to me as well. - PR: https://git.openjdk.java.net/jdk/pull/5547
Re: RFR: 8273459: Update code segment alignment to 64 bytes
On Fri, 17 Sep 2021 14:00:44 GMT, Scott Gibbons wrote: >> Change the default code entry alignment to 64 bytes from 32 bytes. This >> allows for maintaining proper 64-byte alignment of data within a code >> segment, which is required by several AVX-512 instructions. >> >> I ran into this while implementing Base64 encoding and decoding. Code >> segments which were allocated with the address mod 32 == 0 but with the >> address mod 64 != 0 would cause the align() macro to misalign. This is >> because the align macro aligns to the size of the code segment and not the >> offset of the PC. So align(64) would align the PC to a multiple of 64 bytes >> from the start of the segment, and not to a pure 64-byte boundary as >> requested. Changing the alignment of the segment to 64 bytes fixes the >> issue. >> >> I have not seen any measurable difference in either performance or memory >> usage with the tests I have run. >> >> See [this >> ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html) >> article for the discussion thread. > > I think I have not made the point clearly enough. The `align` function is > used to manipulate the address bits for the byte following the `align()`. > This means that wherever the code is copied, the address of that byte should > have the appropriate address bit configuration in the copy (as well as the > original, of course). Since the current implementation is using the base > address of the allocated segment to determine alignment, the only way to > ensure the proper bit configuration of the address is to ensure the base > address of the newly-allocated segment is aligned identically to the original. > > I believe this is entirely independent of `MaxVectorSize`, so I don't believe > it's appropriate to use this value for address alignment. Using `pc()` fixes > the case in the source segment, but will break 50% of the time when the > segment is copied with a `CodeEntryAlignment` of 32. > > I think the bottom line is that `align()` is broken for any value greater > than `CodeEntryAlignment`. I can foresee a case where it may be beneficial > (from an algorithm perspective) to have large alignment values, like > align(256) to simplify pointer arithmetic (for example). All of these > proposed changes will not ensure this alignment when a segment is copied. > > Perhaps the appropriate thing to do is to put an `assert()` in `align()` to > fail if the requested alignment cannot be ensured? > > IMHO, the "right" thing to do is to mark the bytes requiring address > alignment and handle the cases on copy. This would add significant > complexity, however. @asgibbons To me Vladimir Kozlov's suggestion of adding a align64() method calling pc() as you originally proposed looks the best. It meets our purpose and is limited in scope. - PR: https://git.openjdk.java.net/jdk/pull/5547
Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v7]
On Thu, 24 Jun 2021 14:50:01 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fixing Windows build warnings > > The rest of testing hs-tier1-4 and xcomp is finished and clean. > So this is the only failure. I attached hs_err file to RFE. Thanks a lot @vnkozlov for the review and test. - PR: https://git.openjdk.java.net/jdk/pull/4368
Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v6]
On Tue, 22 Jun 2021 20:47:55 GMT, Scott Gibbons wrote: >> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to >> accept an additional parameter (isMIME) for fast-path MIME decoding. >> >> A change was made to the signature of DecodeBlock in Base64.java to provide >> the intrinsic information as to whether MIME decoding was being done. This >> allows for the intrinsic to bypass the expensive setup of zmm registers from >> AVX tables, knowing there may be invalid Base64 characters every 76 >> characters or so. A change was also made here removing the restriction that >> the intrinsic must return an even multiple of 3 bytes decoded. This >> implementation handles the pad characters at the end of the string and will >> return the actual number of characters decoded. >> >> The AVX portion of this code will decode in blocks of 256 bytes per loop >> iteration, then in chunks of 64 bytes, followed by end fixup decoding. The >> non-AVX code is an assembly-optimized version of the java DecodeBlock and >> behaves identically. >> >> Running the Base64Decode benchmark, this change increases decode performance >> by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers >> are given in the table below. >> >> **Base Score** is without intrinsic support, **Optimized Score** is using >> this intrinsic, and **Gain** is **Base** / **Optimized**. >> >> >> Benchmark Name | Base Score | Optimized Score | Gain >> -- | -- | -- | -- >> testBase64Decode size 1 | 15.36 | 15.32 | 1.00 >> testBase64Decode size 3 | 17.00 | 16.72 | 1.02 >> testBase64Decode size 7 | 20.60 | 18.82 | 1.09 >> testBase64Decode size 32 | 34.21 | 26.77 | 1.28 >> testBase64Decode size 64 | 54.43 | 38.35 | 1.42 >> testBase64Decode size 80 | 66.40 | 48.34 | 1.37 >> testBase64Decode size 96 | 73.16 | 52.90 | 1.38 >> testBase64Decode size 112 | 84.93 | 51.82 | 1.64 >> testBase64Decode size 512 | 288.81 | 32.04 | 9.01 >> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74 >> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72 >> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15 >> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07 >> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10 >> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02 >> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10 >> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05 >> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00 >> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05 >> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20 >> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09 >> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12 >> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09 >> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21 >> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29 >> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12 >> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05 >> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18 >> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02 >> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24 >> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23 >> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24 >> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14 >> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19 >> testBase64WithErrorInputsDecode size 2 | 1398.44 | 1138.17 | 1.23 >> testBase64WithErrorInputsDecode size 5 | 1409.41 | 1114.16 | 1.26 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Addressing review comments. > > 1. Changed errorvec handling > 2. Removed unnecessary register copies and aliasing > 3. Streamlined mask generation @asgibbons The patch looks good to me. @vnkozlov We need one more review for this patch. Could you please help? - PR: https://git.openjdk.java.net/jdk/pull/4368
Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v6]
On Tue, 22 Jun 2021 20:47:55 GMT, Scott Gibbons wrote: >> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to >> accept an additional parameter (isMIME) for fast-path MIME decoding. >> >> A change was made to the signature of DecodeBlock in Base64.java to provide >> the intrinsic information as to whether MIME decoding was being done. This >> allows for the intrinsic to bypass the expensive setup of zmm registers from >> AVX tables, knowing there may be invalid Base64 characters every 76 >> characters or so. A change was also made here removing the restriction that >> the intrinsic must return an even multiple of 3 bytes decoded. This >> implementation handles the pad characters at the end of the string and will >> return the actual number of characters decoded. >> >> The AVX portion of this code will decode in blocks of 256 bytes per loop >> iteration, then in chunks of 64 bytes, followed by end fixup decoding. The >> non-AVX code is an assembly-optimized version of the java DecodeBlock and >> behaves identically. >> >> Running the Base64Decode benchmark, this change increases decode performance >> by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers >> are given in the table below. >> >> **Base Score** is without intrinsic support, **Optimized Score** is using >> this intrinsic, and **Gain** is **Base** / **Optimized**. >> >> >> Benchmark Name | Base Score | Optimized Score | Gain >> -- | -- | -- | -- >> testBase64Decode size 1 | 15.36 | 15.32 | 1.00 >> testBase64Decode size 3 | 17.00 | 16.72 | 1.02 >> testBase64Decode size 7 | 20.60 | 18.82 | 1.09 >> testBase64Decode size 32 | 34.21 | 26.77 | 1.28 >> testBase64Decode size 64 | 54.43 | 38.35 | 1.42 >> testBase64Decode size 80 | 66.40 | 48.34 | 1.37 >> testBase64Decode size 96 | 73.16 | 52.90 | 1.38 >> testBase64Decode size 112 | 84.93 | 51.82 | 1.64 >> testBase64Decode size 512 | 288.81 | 32.04 | 9.01 >> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74 >> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72 >> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15 >> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07 >> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10 >> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02 >> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10 >> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05 >> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00 >> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05 >> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20 >> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09 >> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12 >> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09 >> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21 >> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29 >> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12 >> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05 >> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18 >> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02 >> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24 >> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23 >> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24 >> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14 >> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19 >> testBase64WithErrorInputsDecode size 2 | 1398.44 | 1138.17 | 1.23 >> testBase64WithErrorInputsDecode size 5 | 1409.41 | 1114.16 | 1.26 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Addressing review comments. > > 1. Changed errorvec handling > 2. Removed unnecessary register copies and aliasing > 3. Streamlined mask generation Marked as reviewed by sviswanathan (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/4368
Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v5]
On Fri, 18 Jun 2021 22:12:11 GMT, Scott Gibbons wrote: >> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to >> accept an additional parameter (isMIME) for fast-path MIME decoding. >> >> A change was made to the signature of DecodeBlock in Base64.java to provide >> the intrinsic information as to whether MIME decoding was being done. This >> allows for the intrinsic to bypass the expensive setup of zmm registers from >> AVX tables, knowing there may be invalid Base64 characters every 76 >> characters or so. A change was also made here removing the restriction that >> the intrinsic must return an even multiple of 3 bytes decoded. This >> implementation handles the pad characters at the end of the string and will >> return the actual number of characters decoded. >> >> The AVX portion of this code will decode in blocks of 256 bytes per loop >> iteration, then in chunks of 64 bytes, followed by end fixup decoding. The >> non-AVX code is an assembly-optimized version of the java DecodeBlock and >> behaves identically. >> >> Running the Base64Decode benchmark, this change increases decode performance >> by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers >> are given in the table below. >> >> **Base Score** is without intrinsic support, **Optimized Score** is using >> this intrinsic, and **Gain** is **Base** / **Optimized**. >> >> >> Benchmark Name | Base Score | Optimized Score | Gain >> -- | -- | -- | -- >> testBase64Decode size 1 | 15.36 | 15.32 | 1.00 >> testBase64Decode size 3 | 17.00 | 16.72 | 1.02 >> testBase64Decode size 7 | 20.60 | 18.82 | 1.09 >> testBase64Decode size 32 | 34.21 | 26.77 | 1.28 >> testBase64Decode size 64 | 54.43 | 38.35 | 1.42 >> testBase64Decode size 80 | 66.40 | 48.34 | 1.37 >> testBase64Decode size 96 | 73.16 | 52.90 | 1.38 >> testBase64Decode size 112 | 84.93 | 51.82 | 1.64 >> testBase64Decode size 512 | 288.81 | 32.04 | 9.01 >> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74 >> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72 >> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15 >> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07 >> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10 >> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02 >> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10 >> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05 >> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00 >> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05 >> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20 >> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09 >> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12 >> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09 >> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21 >> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29 >> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12 >> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05 >> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18 >> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02 >> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24 >> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23 >> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24 >> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14 >> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19 >> testBase64WithErrorInputsDecode size 2 | 1398.44 | 1138.17 | 1.23 >> testBase64WithErrorInputsDecode size 5 | 1409.41 | 1114.16 | 1.26 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Added comments. Streamlined flow for decode. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6155: > 6153: __ subl(output_size, length); > 6154: __ movq(rax, -1); > 6155: __ shrxq(rax, rax, output_size);// Input mask in rax I think this could also be implemented as: __ movq(rax, -1); __ bzhiq(rax, rax, length); src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6173: > 6171: __ movq(rax, 64); > 6172: __ subq(rax, output_size); > 6173: __ shrxq(output_mask, output_mask, rax); The output mask can also be computed using bzhiq: __ movq(output_mask, -1); __ bzhiq(output_mask, output_mask, output_size); src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6243: > 6241: > 6242: __ BIND(L_padding); > 6243: __ decrementq(r13, 1); It will be good to use output_size here instead of r13. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6249: > 6247: __ jcc(Assembler::notEqual, L_donePadding); > 6248: > 6249: __ decrementq(r13, 1); It will be good to use
Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v5]
On Fri, 18 Jun 2021 22:12:11 GMT, Scott Gibbons wrote: >> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to >> accept an additional parameter (isMIME) for fast-path MIME decoding. >> >> A change was made to the signature of DecodeBlock in Base64.java to provide >> the intrinsic information as to whether MIME decoding was being done. This >> allows for the intrinsic to bypass the expensive setup of zmm registers from >> AVX tables, knowing there may be invalid Base64 characters every 76 >> characters or so. A change was also made here removing the restriction that >> the intrinsic must return an even multiple of 3 bytes decoded. This >> implementation handles the pad characters at the end of the string and will >> return the actual number of characters decoded. >> >> The AVX portion of this code will decode in blocks of 256 bytes per loop >> iteration, then in chunks of 64 bytes, followed by end fixup decoding. The >> non-AVX code is an assembly-optimized version of the java DecodeBlock and >> behaves identically. >> >> Running the Base64Decode benchmark, this change increases decode performance >> by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers >> are given in the table below. >> >> **Base Score** is without intrinsic support, **Optimized Score** is using >> this intrinsic, and **Gain** is **Base** / **Optimized**. >> >> >> Benchmark Name | Base Score | Optimized Score | Gain >> -- | -- | -- | -- >> testBase64Decode size 1 | 15.36 | 15.32 | 1.00 >> testBase64Decode size 3 | 17.00 | 16.72 | 1.02 >> testBase64Decode size 7 | 20.60 | 18.82 | 1.09 >> testBase64Decode size 32 | 34.21 | 26.77 | 1.28 >> testBase64Decode size 64 | 54.43 | 38.35 | 1.42 >> testBase64Decode size 80 | 66.40 | 48.34 | 1.37 >> testBase64Decode size 96 | 73.16 | 52.90 | 1.38 >> testBase64Decode size 112 | 84.93 | 51.82 | 1.64 >> testBase64Decode size 512 | 288.81 | 32.04 | 9.01 >> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74 >> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72 >> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15 >> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07 >> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10 >> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02 >> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10 >> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05 >> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00 >> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05 >> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20 >> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09 >> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12 >> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09 >> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21 >> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29 >> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12 >> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05 >> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18 >> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02 >> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24 >> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23 >> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24 >> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14 >> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19 >> testBase64WithErrorInputsDecode size 2 | 1398.44 | 1138.17 | 1.23 >> testBase64WithErrorInputsDecode size 5 | 1409.41 | 1114.16 | 1.26 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Added comments. Streamlined flow for decode. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6004: > 6002: __ BIND(L_continue); > 6003: > 6004: __ vpxor(errorvec, errorvec, errorvec, Assembler::AVX_512bit); Why clearing errorvec is needed here? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6023: > 6021: __ evmovdquq(tmp16_op3, pack16_op, Assembler::AVX_512bit); > 6022: __ evmovdquq(tmp16_op2, pack16_op, Assembler::AVX_512bit); > 6023: __ evmovdquq(tmp16_op1, pack16_op, Assembler::AVX_512bit); Why do we need 3 additional copies of pack16_op? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6026: > 6024: __ evmovdquq(tmp32_op3, pack32_op, Assembler::AVX_512bit); > 6025: __ evmovdquq(tmp32_op2, pack32_op, Assembler::AVX_512bit); > 6026: __ evmovdquq(tmp32_op1, pack32_op, Assembler::AVX_512bit); Why do we need 3 additional copies of pack32_op? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6051: > 6049: __ vpternlogd(t0, 0xfe, input1, input2, Ass
Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v3]
On Tue, 8 Jun 2021 00:30:38 GMT, Scott Gibbons wrote: >> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to >> accept an additional parameter (isMIME) for fast-path MIME decoding. >> >> A change was made to the signature of DecodeBlock in Base64.java to provide >> the intrinsic information as to whether MIME decoding was being done. This >> allows for the intrinsic to bypass the expensive setup of zmm registers from >> AVX tables, knowing there may be invalid Base64 characters every 76 >> characters or so. A change was also made here removing the restriction that >> the intrinsic must return an even multiple of 3 bytes decoded. This >> implementation handles the pad characters at the end of the string and will >> return the actual number of characters decoded. >> >> The AVX portion of this code will decode in blocks of 256 bytes per loop >> iteration, then in chunks of 64 bytes, followed by end fixup decoding. The >> non-AVX code is an assembly-optimized version of the java DecodeBlock and >> behaves identically. >> >> Running the Base64Decode benchmark, this change increases decode performance >> by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers >> are given in the table below. >> >> **Base Score** is without intrinsic support, **Optimized Score** is using >> this intrinsic, and **Gain** is **Base** / **Optimized**. >> >> >> Benchmark Name | Base Score | Optimized Score | Gain >> -- | -- | -- | -- >> testBase64Decode size 1 | 15.36 | 15.32 | 1.00 >> testBase64Decode size 3 | 17.00 | 16.72 | 1.02 >> testBase64Decode size 7 | 20.60 | 18.82 | 1.09 >> testBase64Decode size 32 | 34.21 | 26.77 | 1.28 >> testBase64Decode size 64 | 54.43 | 38.35 | 1.42 >> testBase64Decode size 80 | 66.40 | 48.34 | 1.37 >> testBase64Decode size 96 | 73.16 | 52.90 | 1.38 >> testBase64Decode size 112 | 84.93 | 51.82 | 1.64 >> testBase64Decode size 512 | 288.81 | 32.04 | 9.01 >> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74 >> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72 >> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15 >> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07 >> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10 >> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02 >> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10 >> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05 >> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00 >> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05 >> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20 >> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09 >> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12 >> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09 >> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21 >> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29 >> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12 >> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05 >> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18 >> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02 >> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24 >> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23 >> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24 >> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14 >> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19 >> testBase64WithErrorInputsDecode size 2 | 1398.44 | 1138.17 | 1.23 >> testBase64WithErrorInputsDecode size 5 | 1409.41 | 1114.16 | 1.26 > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Fixing review comments. Adding notes about isMIME parameter for other > architectures; clarifying decodeBlock comments. @asgibbons Thanks a lot for contributing this. The performance gain is impressive. I have some minor comments. Please take a look. src/hotspot/cpu/x86/assembler_x86.cpp line 4555: > 4553: void Assembler::evpmaddubsw(XMMRegister dst, XMMRegister src1, > XMMRegister src2, int vector_len) { > 4554: assert(VM_Version::supports_avx512bw(), ""); > 4555: InstructionAttr attributes(vector_len, /* rex_w */ false, /* > legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ true); This instruction is also supported on AVX platforms. The assert check could be as follows: assert(vector_len == AVX_128bit? VM_Version::supports_avx() : vector_len == AVX_256bit? VM_Version::supports_avx2() : vector_len == AVX_512bit? VM_Version::supports_avx512bw() : 0, ""); Accordingly the instruction could be named as vpmaddubsw. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp l
Integrated: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics
On Thu, 22 Apr 2021 19:07:28 GMT, Sandhya Viswanathan wrote: > This PR contains Short Vector Math Library support related changes for > [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), > in preparation for when targeted. > > Intel Short Vector Math Library (SVML) based intrinsics in native x86 > assembly provide optimized implementation for Vector API transcendental and > trigonometric methods. > These methods are built into a separate library instead of being part of > libjvm.so or jvm.dll. > > The following changes are made: >The source for these methods is placed in the jdk.incubator.vector module > under src/jdk.incubator.vector/linux/native/libsvml and > src/jdk.incubator.vector/windows/native/libsvml. >The assembly source files are named as “*.S” and include files are named > as “*.S.inc”. >The corresponding build script is placed at > make/modules/jdk.incubator.vector/Lib.gmk. >Changes are made to build system to support dependency tracking for > assembly files with includes. >The built native libraries (libsvml.so/svml.dll) are placed in bin > directory of JDK on Windows and lib directory of JDK on Linux. >The C2 JIT uses the dll_load and dll_lookup to get the addresses of > optimized methods from this library. > > Build system changes and module library build scripts are contributed by > Magnus (magnus.ihse.bur...@oracle.com). > > Looking forward to your review and feedback. > > Performance: > Micro benchmark Base Optimized Unit Gain(Optimized/Base) > Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 > Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 > Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 > Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 > Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 > Double128Vector.COS 49.94 245.89 ops/ms 4.92 > Double128Vector.COSH 26.91 126.00 ops/ms 4.68 > Double128Vector.EXP 71.64 379.65 ops/ms 5.30 > Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 > Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 > Double128Vector.LOG 61.95 279.84 ops/ms 4.52 > Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 > Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 > Double128Vector.SIN 49.36 240.79 ops/ms 4.88 > Double128Vector.SINH 26.59 103.75 ops/ms 3.90 > Double128Vector.TAN 41.05 152.39 ops/ms 3.71 > Double128Vector.TANH 45.29 169.53 ops/ms 3.74 > Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 > Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 > Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 > Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 > Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 > Double256Vector.COS 58.26 389.77 ops/ms 6.69 > Double256Vector.COSH 29.44 151.11 ops/ms 5.13 > Double256Vector.EXP 86.67 564.68 ops/ms 6.52 > Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 > Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 > Double256Vector.LOG 71.52 394.90 ops/ms 5.52 > Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 > Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 > Double256Vector.SIN 57.06 380.98 ops/ms 6.68 > Double256Vector.SINH 29.40 117.37 ops/ms 3.99 > Double256Vector.TAN 44.90 279.90 ops/ms 6.23 > Double256Vector.TANH 54.08 274.71 ops/ms 5.08 > Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 > Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 > Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 > Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 > Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 > Double512Vector.COS 59.88 837.04 ops/ms 13.98 > Double512Vector.COSH 30.34 172.76 ops/ms 5.70 > Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 > Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 > Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 > Double512Vector.LOG 74.84 996.00 ops/ms 13.31 > Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 > Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 > Double512Vector.POW 37.42 384.13 ops/ms 10.26 > Double512Vector.SIN 59.74 728.45 ops/ms 12.19 > Double512Vector.SINH 29.47 143.38 ops/ms 4.87 > Double512Vector.TAN 46.20 587.21 ops/ms 12.71 > Double512Vector.TANH 57.36 495.42 ops/ms 8.64 > Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 > Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 > Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 > Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 > Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 > Double64Vector.COS 23.42 152.01 ops/ms 6.49 > Double64Vector.COSH 17.34 113.34 ops/ms 6.54 > Double64Vector.EXP 27.08 203.53 ops/ms 7.52 > Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 > Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 > Double64Vector.LOG 26.75 142.63 ops/ms 5.33 > Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 > Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 > Double64Vector.SIN 23.28 146.
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v17]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: - Merge master - update javadoc - correct javadoc - Javadoc changes - correct ppc.ad - Merge master - Commit missing changes - Implement Vladimir Ivanov and Paul Sandoz review comments - fix 32-bit build - Add comments explaining naming convention - ... and 11 more: https://git.openjdk.java.net/jdk/compare/52d8215a...03ac3197 - Changes: https://git.openjdk.java.net/jdk/pull/3638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=16 Stats: 416073 lines in 119 files changed: 415886 ins; 124 del; 63 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v16]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: update javadoc - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/e5208a18..b229e8b4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=14-15 Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v15]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: correct javadoc - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/6cd50248..e5208a18 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=14 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=13-14 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v14]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Javadoc changes - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/4d59af0a..6cd50248 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=13 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=12-13 Stats: 58 lines in 1 file changed: 38 ins; 0 del; 20 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v13]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: correct ppc.ad - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/7b959b67..4d59af0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=11-12 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v12]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge master - Commit missing changes - Implement Vladimir Ivanov and Paul Sandoz review comments - fix 32-bit build - Add comments explaining naming convention - jcheck fixes - Print intrinsic fix - Implement review comments - Add missing Lib.gmk - Merge master - ... and 6 more: https://git.openjdk.java.net/jdk/compare/b961f253...7b959b67 - Changes: https://git.openjdk.java.net/jdk/pull/3638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=11 Stats: 416021 lines in 119 files changed: 415854 ins; 124 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Commit missing changes - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/0b4a1c9c..1b0367ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=09-10 Stats: 55 lines in 16 files changed: 2 ins; 42 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Wed, 19 May 2021 22:02:14 GMT, Paul Sandoz wrote: >> Tier 1 to 3 tests pass for the default set of build profiles. > >> Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. >> Paul and Vladimir, I have implemented your review comments. Please take a >> look. > > `case VECTOR_OP_OR` is still present. @PaulSandoz Thanks for pointing that out. I had missed git add for some of the files. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. Thanks a lot for the review @PaulSandoz @iwanowww @erikj79. Paul and Vladimir, I have implemented your review comments. Please take a look. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v10]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement Vladimir Ivanov and Paul Sandoz review comments - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/f7e39913..0b4a1c9c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=08-09 Stats: 45 lines in 1 file changed: 0 ins; 45 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: fix 32-bit build - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/45f20a34..f7e39913 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]
On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]
On Wed, 19 May 2021 00:26:48 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one >> additional commit since the last revision: >> >> jcheck fixes > > This is much much better! Thank you for changing it. I am only asking now to > add comment explaining names. @vnkozlov I have added comments explaining naming convention. Please let me know if this looks ok. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Add comments explaining naming convention - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/0d1d0382..45f20a34 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=06-07 Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: jcheck fixes - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/11528426..0d1d0382 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=05-06 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]
On Tue, 18 May 2021 23:43:13 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Print intrinsic fix - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/9021a15c..11528426 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=04-05 Stats: 9 lines in 1 file changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v5]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/01a549e4..9021a15c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=03-04 Stats: 1220 lines in 8 files changed: 48 ins; 1104 del; 68 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Add missing Lib.gmk - Changes: - all: https://git.openjdk.java.net/jdk/pull/3638/files - new: https://git.openjdk.java.net/jdk/pull/3638/files/6e105f51..01a549e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=02-03 Stats: 42 lines in 1 file changed: 42 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v3]
59 ops/ms 5.03 > Double64Vector.TAN 21.00 86.43 ops/ms 4.12 > Double64Vector.TANH 23.75 111.35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge master - Merge master - remove whitespace - Merge master - Small fix - cleanup - x86 short vector math optimization for Vector API - Changes: https://git.openjdk.java.net/jdk/pull/3638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=02 Stats: 417101 lines in 120 files changed: 416935 ins; 123 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan wrote: >> This PR contains Short Vector Math Library support related changes for >> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), >> in preparation for when targeted. >> >> Intel Short Vector Math Library (SVML) based intrinsics in native x86 >> assembly provide optimized implementation for Vector API transcendental and >> trigonometric methods. >> These methods are built into a separate library instead of being part of >> libjvm.so or jvm.dll. >> >> The following changes are made: >>The source for these methods is placed in the jdk.incubator.vector module >> under src/jdk.incubator.vector/linux/native/libsvml and >> src/jdk.incubator.vector/windows/native/libsvml. >>The assembly source files are named as “*.S” and include files are named >> as “*.S.inc”. >>The corresponding build script is placed at >> make/modules/jdk.incubator.vector/Lib.gmk. >>Changes are made to build system to support dependency tracking for >> assembly files with includes. >>The built native libraries (libsvml.so/svml.dll) are placed in bin >> directory of JDK on Windows and lib directory of JDK on Linux. >>The C2 JIT uses the dll_load and dll_lookup to get the addresses of >> optimized methods from this library. >> >> Build system changes and module library build scripts are contributed by >> Magnus (magnus.ihse.bur...@oracle.com). >> >> Looking forward to your review and feedback. >> >> Performance: >> Micro benchmark Base Optimized Unit Gain(Optimized/Base) >> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 >> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 >> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 >> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 >> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 >> Double128Vector.COS 49.94 245.89 ops/ms 4.92 >> Double128Vector.COSH 26.91 126.00 ops/ms 4.68 >> Double128Vector.EXP 71.64 379.65 ops/ms 5.30 >> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 >> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 >> Double128Vector.LOG 61.95 279.84 ops/ms 4.52 >> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 >> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 >> Double128Vector.SIN 49.36 240.79 ops/ms 4.88 >> Double128Vector.SINH 26.59 103.75 ops/ms 3.90 >> Double128Vector.TAN 41.05 152.39 ops/ms 3.71 >> Double128Vector.TANH 45.29 169.53 ops/ms 3.74 >> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 >> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 >> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 >> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 >> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 >> Double256Vector.COS 58.26 389.77 ops/ms 6.69 >> Double256Vector.COSH 29.44 151.11 ops/ms 5.13 >> Double256Vector.EXP 86.67 564.68 ops/ms 6.52 >> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 >> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 >> Double256Vector.LOG 71.52 394.90 ops/ms 5.52 >> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 >> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 >> Double256Vector.SIN 57.06 380.98 ops/ms 6.68 >> Double256Vector.SINH 29.40 117.37 ops/ms 3.99 >> Double256Vector.TAN 44.90 279.90 ops/ms 6.23 >> Double256Vector.TANH 54.08 274.71 ops/ms 5.08 >> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 >> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 >> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 >> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 >> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 >> Double512Vector.COS 59.88 837.04 ops/ms 13.98 >> Double512Vector.COSH 30.34 172.76 ops/ms 5.70 >> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 >> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 >> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 >> Double512Vector.LOG 74.84 996.00 ops/ms 13.31 >> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 >> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 >> Double512Vector.POW 37.42 384.13 ops/ms 10.26 >> Double512Vector.SIN 59.74 728.45 ops/ms 12.19 >> Double512Vector.SINH 29.47 143.38 ops/ms 4.87 >> Double512Vector.TAN 46.20 587.21 ops/ms 12.71 >> Double512Vector.TANH 57.36 495.42 ops/ms 8.64 >> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 >> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 >> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 >> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 >> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 >> Double64Vector.COS 23.42 152.01 ops/ms 6.49 >> Double64Vector
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz wrote: >> Sandhya Viswanathan has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains six commits: >> >> - Merge master >> - remove whitespace >> - Merge master >> - Small fix >> - cleanup >> - x86 short vector math optimization for Vector API > > Tier 1 to 3 tests pass for the default set of build profiles. @PaulSandoz Thanks a lot for running through the tests. - PR: https://git.openjdk.java.net/jdk/pull/3638
Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]
35 ops/ms 4.69 > Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 > Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 > Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 > Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 > Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 > Float128Vector.COS 42.82 803.02 ops/ms 18.75 > Float128Vector.COSH 31.44 118.34 ops/ms 3.76 > Float128Vector.EXP 72.43 855.33 ops/ms 11.81 > Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 > Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 > Float128Vector.LOG 52.95 877.94 ops/ms 16.58 > Float128Vector.LOG10 49.26 603.72 ops/ms 12.26 > Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61 > Float128Vector.SIN 43.38 745.31 ops/ms 17.18 > Float128Vector.SINH 31.11 112.91 ops/ms 3.63 > Float128Vector.TAN 37.25 332.13 ops/ms 8.92 > Float128Vector.TANH 57.63 453.77 ops/ms 7.87 > Float256Vector.ACOS 65.23 123.73 ops/ms 1.90 > Float256Vector.ASIN 63.41 132.86 ops/ms 2.10 > Float256Vector.ATAN 23.51 649.02 ops/ms 27.61 > Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07 > Float256Vector.CBRT 45.99 594.81 ops/ms 12.93 > Float256Vector.COS 43.75 926.69 ops/ms 21.18 > Float256Vector.COSH 33.52 130.46 ops/ms 3.89 > Float256Vector.EXP 75.70 1366.72 ops/ms 18.05 > Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84 > Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34 > Float256Vector.LOG 53.31 1545.77 ops/ms 29.00 > Float256Vector.LOG10 50.31 863.80 ops/ms 17.17 > Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66 > Float256Vector.SIN 44.07 911.04 ops/ms 20.67 > Float256Vector.SINH 33.16 122.50 ops/ms 3.69 > Float256Vector.TAN 37.85 497.75 ops/ms 13.15 > Float256Vector.TANH 64.27 537.20 ops/ms 8.36 > Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52 > Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93 > Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69 > Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57 > Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11 > Float512Vector.COS 40.92 1567.93 ops/ms 38.32 > Float512Vector.COSH 33.42 138.36 ops/ms 4.14 > Float512Vector.EXP 70.51 3835.97 ops/ms 54.41 > Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35 > Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47 > Float512Vector.LOG 49.61 3156.99 ops/ms 63.64 > Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02 > Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81 > Float512Vector.POW 32.73 1015.85 ops/ms 31.04 > Float512Vector.SIN 41.17 1587.71 ops/ms 38.56 > Float512Vector.SINH 33.05 129.39 ops/ms 3.91 > Float512Vector.TAN 35.60 1336.11 ops/ms 37.53 > Float512Vector.TANH 65.77 2295.28 ops/ms 34.90 > Float64Vector.ACOS 48.41 89.34 ops/ms 1.85 > Float64Vector.ASIN 47.30 95.72 ops/ms 2.02 > Float64Vector.ATAN 20.62 49.45 ops/ms 2.40 > Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04 > Float64Vector.CBRT 24.03 134.57 ops/ms 5.60 > Float64Vector.COS 44.28 394.33 ops/ms 8.91 > Float64Vector.COSH 28.35 95.27 ops/ms 3.36 > Float64Vector.EXP 65.80 486.37 ops/ms 7.39 > Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48 > Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93 > Float64Vector.LOG 51.93 163.25 ops/ms 3.14 > Float64Vector.LOG10 49.53 147.98 ops/ms 2.99 > Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77 > Float64Vector.SIN 44.41 382.09 ops/ms 8.60 > Float64Vector.SINH 28.20 90.68 ops/ms 3.22 > Float64Vector.TAN 36.29 160.89 ops/ms 4.43 > Float64Vector.TANH 47.65 214.04 ops/ms 4.49 Sandhya Viswanathan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge master - remove whitespace - Merge master - Small fix - cleanup - x86 short vector math optimization for Vector API - Changes: https://git.openjdk.java.net/jdk/pull/3638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=01 Stats: 417102 lines in 120 files changed: 416935 ins; 123 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/3638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638 PR: https://git.openjdk.java.net/jdk/pull/3638
RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics
Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly provide optimized implementation for Vector API transcendental and trigonometric methods. These methods are built into a separate library instead of being part of libjvm.so or jvm.dll. The following changes are made: The source for these methods is placed in the jdk.incubator.vector module under src/jdk.incubator.vector/linux/native/libsvml and src/jdk.incubator.vector/windows/native/libsvml. The assembly source files are named as “*.S” and include files are named as “*.S.inc”. The corresponding build script is placed at make/modules/jdk.incubator.vector/Lib.gmk. Changes are made to build system to support dependency tracking for assembly files with includes. The built native libraries (libsvml.so/svml.dll) are placed in bin directory of JDK on Windows and lib directory of JDK on Linux. The C2 JIT uses the dll_load and dll_lookup to get the addresses of optimized methods from this library. Build system changes and module library build scripts are contributed by Magnus (magnus.ihse.bur...@oracle.com). This work is part of second round of incubation of the Vector API. JEP: https://bugs.openjdk.java.net/browse/JDK-8261663 Please review. Performance: Micro benchmark Base Optimized Unit Gain(Optimized/Base) Double128Vector.ACOS 45.91 87.34 ops/ms 1.90 Double128Vector.ASIN 45.06 92.36 ops/ms 2.05 Double128Vector.ATAN 19.92 118.36 ops/ms 5.94 Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79 Double128Vector.CBRT 45.77 208.36 ops/ms 4.55 Double128Vector.COS 49.94 245.89 ops/ms 4.92 Double128Vector.COSH 26.91 126.00 ops/ms 4.68 Double128Vector.EXP 71.64 379.65 ops/ms 5.30 Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18 Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44 Double128Vector.LOG 61.95 279.84 ops/ms 4.52 Double128Vector.LOG10 59.34 239.05 ops/ms 4.03 Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79 Double128Vector.SIN 49.36 240.79 ops/ms 4.88 Double128Vector.SINH 26.59 103.75 ops/ms 3.90 Double128Vector.TAN 41.05 152.39 ops/ms 3.71 Double128Vector.TANH 45.29 169.53 ops/ms 3.74 Double256Vector.ACOS 54.21 106.39 ops/ms 1.96 Double256Vector.ASIN 53.60 107.99 ops/ms 2.01 Double256Vector.ATAN 21.53 189.11 ops/ms 8.78 Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44 Double256Vector.CBRT 56.45 397.13 ops/ms 7.04 Double256Vector.COS 58.26 389.77 ops/ms 6.69 Double256Vector.COSH 29.44 151.11 ops/ms 5.13 Double256Vector.EXP 86.67 564.68 ops/ms 6.52 Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80 Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62 Double256Vector.LOG 71.52 394.90 ops/ms 5.52 Double256Vector.LOG10 65.43 362.32 ops/ms 5.54 Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05 Double256Vector.SIN 57.06 380.98 ops/ms 6.68 Double256Vector.SINH 29.40 117.37 ops/ms 3.99 Double256Vector.TAN 44.90 279.90 ops/ms 6.23 Double256Vector.TANH 54.08 274.71 ops/ms 5.08 Double512Vector.ACOS 55.65 687.54 ops/ms 12.35 Double512Vector.ASIN 57.31 777.72 ops/ms 13.57 Double512Vector.ATAN 21.42 729.21 ops/ms 34.04 Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32 Double512Vector.CBRT 56.78 834.38 ops/ms 14.69 Double512Vector.COS 59.88 837.04 ops/ms 13.98 Double512Vector.COSH 30.34 172.76 ops/ms 5.70 Double512Vector.EXP 99.66 1608.12 ops/ms 16.14 Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34 Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34 Double512Vector.LOG 74.84 996.00 ops/ms 13.31 Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72 Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34 Double512Vector.POW 37.42 384.13 ops/ms 10.26 Double512Vector.SIN 59.74 728.45 ops/ms 12.19 Double512Vector.SINH 29.47 143.38 ops/ms 4.87 Double512Vector.TAN 46.20 587.21 ops/ms 12.71 Double512Vector.TANH 57.36 495.42 ops/ms 8.64 Double64Vector.ACOS 24.04 73.67 ops/ms 3.06 Double64Vector.ASIN 23.78 75.11 ops/ms 3.16 Double64Vector.ATAN 14.14 62.81 ops/ms 4.44 Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28 Double64Vector.CBRT 16.47 107.50 ops/ms 6.53 Double64Vector.COS 23.42 152.01 ops/ms 6.49 Double64Vector.COSH 17.34 113.34 ops/ms 6.54 Double64Vector.EXP 27.08 203.53 ops/ms 7.52 Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15 Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59 Double64Vector.LOG 26.75 142.63 ops/ms 5.33 Double64Vector.LOG10 25.85 139.71 ops/ms 5.40 Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38 Double64Vector.SIN 23.28 146.91 ops/ms 6.31 Double64Vector.SINH 17.62 88.59 ops/ms 5.03 Double64Vector.TAN 21.00 86.43 ops/ms 4.12 Double64Vector.TANH 23.75 111.35 ops/ms 4.69 Float128Vector.ACOS 57.52 110.65 ops/ms 1.92 Float128Vector.ASIN 57.15 117.95 ops/ms 2.06 Float128Vector.ATAN 22.52 318.74 ops/ms 14.15 Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42 Float128Vector.CBRT 29.72 443.74 ops/ms 14.93 Float128Vector.COS 42.82 803.02 ops/ms 18.75 Float128Vector.COSH 31.44 118.34 ops/ms 3.76 Float128Vector.EXP 72.43 855.33 ops/ms 11.81 Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38 Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12 Float128Vector.LOG 52.95 877.94 ops/ms 16.5
Re: RFR: 8223347: Integration of Vector API (Incubator) [v4]
On Tue, 13 Oct 2020 21:29:52 GMT, Ekaterina Pavlova wrote: >> Build changes look good. > > There are several gc tests crashed in panama-vector tier3 testing which seems > are not observed in openjdk repo. > The crashes look like: > # assert(oopDesc::is_oop(obj)) failed: not an oop: 0xfff1 > # > # JRE version: Java(TM) SE Runtime Environment (16.0+3) (fastdebug build > 16-panama+3-216) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-panama+3-216, > mixed mode, sharing, tiered, compressed oops, > g1 gc, linux-amd64) # Problematic frame: > # V [libjvm.so+0xd8ef94] HandleArea::allocate_handle(oop)+0x144 > > and the issue is actually tracked by JDK-8233199. > > This issue needs to be at least analyzed before integrating Vector API. @katyapav Is the failure observed on vector-unstable branch of panama-vector? The code in this pull request is from vector-unstable branch. The bug report https://bugs.openjdk.java.net/browse/JDK-8233199 refers to repo-valhalla and not panama-vector:vector-unstable. @PaulSandoz is doing final testing of the pull request today before integration tomorrow hopefully. - PR: https://git.openjdk.java.net/jdk/pull/367