Re: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2]

2021-11-04 Thread Sandhya Viswanathan
On Fri, 5 Nov 2021 00:56:05 GMT, Vladimir Kozlov  wrote:

>> Thanks a lot @vnkozlov.
>
> @sviswa7 testing passed you can integrate.

Thanks a lot @vnkozlov for testing and review.
Thanks @erikj79 @PaulSandoz @magicus for the review.

-

PR: https://git.openjdk.java.net/jdk/pull/6265


Integrated: 8276025: Hotspot's libsvml.so may conflict with user dependency

2021-11-04 Thread Sandhya Viswanathan
On Thu, 4 Nov 2021 17:48:56 GMT, Sandhya Viswanathan  
wrote:

> This patch removes conflicts with libsvml.so distributed with Intel's MKL 
> library:
>   Renames exported symbols from __svml to __jsvml.
>   Renames library from libsvml.so to libjsvml.so.
>   Updates the stubGenerator_x86_64.cpp  accordingly to load libjsvml.so and 
> the renamed symbols.
>   Updates tests to look for the new library.
> 
> Please review.
> 
> Best Regards,
> Sandhya

This pull request has now been integrated.

Changeset: 9ad4d3d0
Author:Sandhya Viswanathan 
URL:   
https://git.openjdk.java.net/jdk/commit/9ad4d3d06bb356436d69af07726ef6727c500f59
Stats: 391989 lines in 123 files changed: 192755 ins; 192755 del; 6479 mod

8276025: Hotspot's libsvml.so may conflict with user dependency

Reviewed-by: kvn, erikj, psandoz, ihse

-

PR: https://git.openjdk.java.net/jdk/pull/6265


Re: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency [v2]

2021-11-04 Thread Sandhya Viswanathan
> This patch removes conflicts with libsvml.so distributed with Intel's MKL 
> library:
>   Renames exported symbols from __svml to __jsvml.
>   Renames library from libsvml.so to libjsvml.so.
>   Updates the stubGenerator_x86_64.cpp  accordingly to load libjsvml.so and 
> the renamed symbols.
>   Updates tests to look for the new library.
> 
> Please review.
> 
> Best Regards,
> Sandhya

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  change filename to jsvml*

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6265/files
  - new: https://git.openjdk.java.net/jdk/pull/6265/files/7c488c10..70d962ae

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=00-01

  Stats: 0 lines in 72 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6265.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6265/head:pull/6265

PR: https://git.openjdk.java.net/jdk/pull/6265


Re: RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency

2021-11-04 Thread Sandhya Viswanathan
On Thu, 4 Nov 2021 18:11:41 GMT, Vladimir Kozlov  wrote:

>> This patch removes conflicts with libsvml.so distributed with Intel's MKL 
>> library:
>>   Renames exported symbols from __svml to __jsvml.
>>   Renames library from libsvml.so to libjsvml.so.
>>   Updates the stubGenerator_x86_64.cpp  accordingly to load libjsvml.so and 
>> the renamed symbols.
>>   Updates tests to look for the new library.
>> 
>> Please review.
>> 
>> Best Regards,
>> Sandhya
>
> Looks good. I will run tests.

Thanks a lot @vnkozlov.

-

PR: https://git.openjdk.java.net/jdk/pull/6265


RFR: 8276025: Hotspot's libsvml.so may conflict with user dependency

2021-11-04 Thread Sandhya Viswanathan
This patch removes conflicts with libsvml.so distributed with Intel's MKL 
library:
  Renames exported symbols from __svml to __jsvml.
  Renames library from libsvml.so to libjsvml.so.
  Updates the stubGenerator_x86_64.cpp  accordingly to load libjsvml.so and the 
renamed symbols.
  Updates tests to look for the new library.

Please review.

Best Regards,
Sandhya

-

Commit messages:
 - load svml test fixes
 - update tests
 - 8276025: Hotspot's libsvml.so may conflict with user dependency

Changes: https://git.openjdk.java.net/jdk/pull/6265/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=6265&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276025
  Stats: 391989 lines in 123 files changed: 192755 ins; 192755 del; 6479 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6265.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6265/head:pull/6265

PR: https://git.openjdk.java.net/jdk/pull/6265


Re: RFR: 8273459: Update code segment alignment to 64 bytes [v4]

2021-09-28 Thread Sandhya Viswanathan
On Tue, 28 Sep 2021 17:31:24 GMT, Scott Gibbons 
 wrote:

>> Change the default code entry alignment to 64 bytes from 32 bytes.  This 
>> allows for maintaining proper 64-byte alignment of data within a code 
>> segment, which is required by several AVX-512 instructions.
>> 
>> I ran into this while implementing Base64 encoding and decoding.  Code 
>> segments which were allocated with the address mod 32 == 0 but with the 
>> address mod 64 != 0 would cause the align() macro to misalign.  This is 
>> because the align macro aligns to the size of the code segment and not the 
>> offset of the PC.  So align(64) would align the PC to a multiple of 64 bytes 
>> from the start of the segment, and not to a pure 64-byte boundary as 
>> requested.  Changing the alignment of the segment to 64 bytes fixes the 
>> issue.
>> 
>> I have not seen any measurable difference in either performance or memory 
>> usage with the tests I have run.
>> 
>> See [this 
>> ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html)
>>  article for the discussion thread.
>
> Scott Gibbons has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Merge branch 'asgibbons-align-fix' of https://github.com/asgibbons/jdk 
> into asgibbons-align-fix
>  - Revert .gitignore. Add comments and assert in align().

The updated patch looks good to me as well.

-

Marked as reviewed by sviswanathan (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5547


Re: RFR: 8273459: Update code segment alignment to 64 bytes [v4]

2021-09-28 Thread Sandhya Viswanathan
On Tue, 28 Sep 2021 17:31:24 GMT, Scott Gibbons 
 wrote:

>> Change the default code entry alignment to 64 bytes from 32 bytes.  This 
>> allows for maintaining proper 64-byte alignment of data within a code 
>> segment, which is required by several AVX-512 instructions.
>> 
>> I ran into this while implementing Base64 encoding and decoding.  Code 
>> segments which were allocated with the address mod 32 == 0 but with the 
>> address mod 64 != 0 would cause the align() macro to misalign.  This is 
>> because the align macro aligns to the size of the code segment and not the 
>> offset of the PC.  So align(64) would align the PC to a multiple of 64 bytes 
>> from the start of the segment, and not to a pure 64-byte boundary as 
>> requested.  Changing the alignment of the segment to 64 bytes fixes the 
>> issue.
>> 
>> I have not seen any measurable difference in either performance or memory 
>> usage with the tests I have run.
>> 
>> See [this 
>> ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html)
>>  article for the discussion thread.
>
> Scott Gibbons has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Merge branch 'asgibbons-align-fix' of https://github.com/asgibbons/jdk 
> into asgibbons-align-fix
>  - Revert .gitignore. Add comments and assert in align().

Looks good to me as well.

-

PR: https://git.openjdk.java.net/jdk/pull/5547


Re: RFR: 8273459: Update code segment alignment to 64 bytes

2021-09-27 Thread Sandhya Viswanathan
On Fri, 17 Sep 2021 14:00:44 GMT, Scott Gibbons 
 wrote:

>> Change the default code entry alignment to 64 bytes from 32 bytes.  This 
>> allows for maintaining proper 64-byte alignment of data within a code 
>> segment, which is required by several AVX-512 instructions.
>> 
>> I ran into this while implementing Base64 encoding and decoding.  Code 
>> segments which were allocated with the address mod 32 == 0 but with the 
>> address mod 64 != 0 would cause the align() macro to misalign.  This is 
>> because the align macro aligns to the size of the code segment and not the 
>> offset of the PC.  So align(64) would align the PC to a multiple of 64 bytes 
>> from the start of the segment, and not to a pure 64-byte boundary as 
>> requested.  Changing the alignment of the segment to 64 bytes fixes the 
>> issue.
>> 
>> I have not seen any measurable difference in either performance or memory 
>> usage with the tests I have run.
>> 
>> See [this 
>> ](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054180.html)
>>  article for the discussion thread.
>
> I think I have not made the point clearly enough.  The `align` function is 
> used to manipulate the address bits for the byte following the `align()`.  
> This means that wherever the code is copied, the address of that byte should 
> have the appropriate address bit configuration in the copy (as well as the 
> original, of course).  Since the current implementation is using the base 
> address of the allocated segment to determine alignment, the only way to 
> ensure the proper bit configuration of the address is to ensure the base 
> address of the newly-allocated segment is aligned identically to the original.
> 
> I believe this is entirely independent of `MaxVectorSize`, so I don't believe 
> it's appropriate to use this value for address alignment.  Using `pc()` fixes 
> the case in the source segment, but will break 50% of the time when the 
> segment is copied with a `CodeEntryAlignment` of 32.
> 
> I think the bottom line is that `align()` is broken for any value greater 
> than `CodeEntryAlignment`.  I can foresee a case where it may be beneficial 
> (from an algorithm perspective) to have large alignment values, like 
> align(256) to simplify pointer arithmetic (for example).  All of these 
> proposed changes will not ensure this alignment when a segment is copied.
> 
> Perhaps the appropriate thing to do is to put an `assert()` in `align()` to 
> fail if the requested alignment cannot be ensured?
> 
> IMHO, the "right" thing to do is to mark the bytes requiring address 
> alignment and handle the cases on copy.  This would add significant 
> complexity, however.

@asgibbons To me Vladimir Kozlov's suggestion of adding a align64() method 
calling pc() as you originally proposed looks the best. It meets our purpose 
and is limited in scope.

-

PR: https://git.openjdk.java.net/jdk/pull/5547


Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v7]

2021-06-24 Thread Sandhya Viswanathan
On Thu, 24 Jun 2021 14:50:01 GMT, Vladimir Kozlov  wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Fixing Windows build warnings
>
> The rest of testing hs-tier1-4 and xcomp is finished and clean.
> So this is the only failure. I attached hs_err file to RFE.

Thanks a lot @vnkozlov for the review and test.

-

PR: https://git.openjdk.java.net/jdk/pull/4368


Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v6]

2021-06-22 Thread Sandhya Viswanathan
On Tue, 22 Jun 2021 20:47:55 GMT, Scott Gibbons 
 wrote:

>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. 
>> Also allows for performance improvement for non-AVX-512 enabled platforms. 
>> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to 
>> accept an additional parameter (isMIME) for fast-path MIME decoding.
>> 
>> A change was made to the signature of DecodeBlock in Base64.java to provide 
>> the intrinsic information as to whether MIME decoding was being done.  This 
>> allows for the intrinsic to bypass the expensive setup of zmm registers from 
>> AVX tables, knowing there may be invalid Base64 characters every 76 
>> characters or so.  A change was also made here removing the restriction that 
>> the intrinsic must return an even multiple of 3 bytes decoded.  This 
>> implementation handles the pad characters at the end of the string and will 
>> return the actual number of characters decoded.
>> 
>> The AVX portion of this code will decode in blocks of 256 bytes per loop 
>> iteration, then in chunks of 64 bytes, followed by end fixup decoding.  The 
>> non-AVX code is an assembly-optimized version of the java DecodeBlock and 
>> behaves identically.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance 
>> by an average of 2.6x with a maximum 19.7x for buffers > ~20k.  The numbers 
>> are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using 
>> this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Decode size 1 | 15.36 | 15.32 | 1.00
>> testBase64Decode size 3 | 17.00 | 16.72 | 1.02
>> testBase64Decode size 7 | 20.60 | 18.82 | 1.09
>> testBase64Decode size 32 | 34.21 | 26.77 | 1.28
>> testBase64Decode size 64 | 54.43 | 38.35 | 1.42
>> testBase64Decode size 80 | 66.40 | 48.34 | 1.37
>> testBase64Decode size 96 | 73.16 | 52.90 | 1.38
>> testBase64Decode size 112 | 84.93 | 51.82 | 1.64
>> testBase64Decode size 512 | 288.81 | 32.04 | 9.01
>> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74
>> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72
>> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15
>> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07
>> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10
>> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02
>> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10
>> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05
>> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00
>> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05
>> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20
>> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09
>> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12
>> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09
>> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21
>> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29
>> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12
>> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05
>> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18
>> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02
>> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24
>> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23
>> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24
>> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14
>> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19
>> testBase64WithErrorInputsDecode size   2 | 1398.44 | 1138.17 | 1.23
>> testBase64WithErrorInputsDecode size   5 | 1409.41 | 1114.16 | 1.26
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Addressing review comments.
>   
>   1. Changed errorvec handling
>   2. Removed unnecessary register copies and aliasing
>   3. Streamlined mask generation

@asgibbons The patch looks good to me.

@vnkozlov We need one more review for this patch. Could you please help?

-

PR: https://git.openjdk.java.net/jdk/pull/4368


Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v6]

2021-06-22 Thread Sandhya Viswanathan
On Tue, 22 Jun 2021 20:47:55 GMT, Scott Gibbons 
 wrote:

>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. 
>> Also allows for performance improvement for non-AVX-512 enabled platforms. 
>> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to 
>> accept an additional parameter (isMIME) for fast-path MIME decoding.
>> 
>> A change was made to the signature of DecodeBlock in Base64.java to provide 
>> the intrinsic information as to whether MIME decoding was being done.  This 
>> allows for the intrinsic to bypass the expensive setup of zmm registers from 
>> AVX tables, knowing there may be invalid Base64 characters every 76 
>> characters or so.  A change was also made here removing the restriction that 
>> the intrinsic must return an even multiple of 3 bytes decoded.  This 
>> implementation handles the pad characters at the end of the string and will 
>> return the actual number of characters decoded.
>> 
>> The AVX portion of this code will decode in blocks of 256 bytes per loop 
>> iteration, then in chunks of 64 bytes, followed by end fixup decoding.  The 
>> non-AVX code is an assembly-optimized version of the java DecodeBlock and 
>> behaves identically.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance 
>> by an average of 2.6x with a maximum 19.7x for buffers > ~20k.  The numbers 
>> are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using 
>> this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Decode size 1 | 15.36 | 15.32 | 1.00
>> testBase64Decode size 3 | 17.00 | 16.72 | 1.02
>> testBase64Decode size 7 | 20.60 | 18.82 | 1.09
>> testBase64Decode size 32 | 34.21 | 26.77 | 1.28
>> testBase64Decode size 64 | 54.43 | 38.35 | 1.42
>> testBase64Decode size 80 | 66.40 | 48.34 | 1.37
>> testBase64Decode size 96 | 73.16 | 52.90 | 1.38
>> testBase64Decode size 112 | 84.93 | 51.82 | 1.64
>> testBase64Decode size 512 | 288.81 | 32.04 | 9.01
>> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74
>> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72
>> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15
>> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07
>> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10
>> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02
>> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10
>> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05
>> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00
>> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05
>> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20
>> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09
>> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12
>> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09
>> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21
>> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29
>> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12
>> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05
>> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18
>> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02
>> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24
>> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23
>> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24
>> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14
>> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19
>> testBase64WithErrorInputsDecode size   2 | 1398.44 | 1138.17 | 1.23
>> testBase64WithErrorInputsDecode size   5 | 1409.41 | 1114.16 | 1.26
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Addressing review comments.
>   
>   1. Changed errorvec handling
>   2. Removed unnecessary register copies and aliasing
>   3. Streamlined mask generation

Marked as reviewed by sviswanathan (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/4368


Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v5]

2021-06-19 Thread Sandhya Viswanathan
On Fri, 18 Jun 2021 22:12:11 GMT, Scott Gibbons 
 wrote:

>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. 
>> Also allows for performance improvement for non-AVX-512 enabled platforms. 
>> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to 
>> accept an additional parameter (isMIME) for fast-path MIME decoding.
>> 
>> A change was made to the signature of DecodeBlock in Base64.java to provide 
>> the intrinsic information as to whether MIME decoding was being done.  This 
>> allows for the intrinsic to bypass the expensive setup of zmm registers from 
>> AVX tables, knowing there may be invalid Base64 characters every 76 
>> characters or so.  A change was also made here removing the restriction that 
>> the intrinsic must return an even multiple of 3 bytes decoded.  This 
>> implementation handles the pad characters at the end of the string and will 
>> return the actual number of characters decoded.
>> 
>> The AVX portion of this code will decode in blocks of 256 bytes per loop 
>> iteration, then in chunks of 64 bytes, followed by end fixup decoding.  The 
>> non-AVX code is an assembly-optimized version of the java DecodeBlock and 
>> behaves identically.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance 
>> by an average of 2.6x with a maximum 19.7x for buffers > ~20k.  The numbers 
>> are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using 
>> this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Decode size 1 | 15.36 | 15.32 | 1.00
>> testBase64Decode size 3 | 17.00 | 16.72 | 1.02
>> testBase64Decode size 7 | 20.60 | 18.82 | 1.09
>> testBase64Decode size 32 | 34.21 | 26.77 | 1.28
>> testBase64Decode size 64 | 54.43 | 38.35 | 1.42
>> testBase64Decode size 80 | 66.40 | 48.34 | 1.37
>> testBase64Decode size 96 | 73.16 | 52.90 | 1.38
>> testBase64Decode size 112 | 84.93 | 51.82 | 1.64
>> testBase64Decode size 512 | 288.81 | 32.04 | 9.01
>> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74
>> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72
>> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15
>> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07
>> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10
>> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02
>> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10
>> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05
>> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00
>> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05
>> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20
>> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09
>> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12
>> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09
>> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21
>> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29
>> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12
>> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05
>> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18
>> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02
>> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24
>> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23
>> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24
>> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14
>> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19
>> testBase64WithErrorInputsDecode size   2 | 1398.44 | 1138.17 | 1.23
>> testBase64WithErrorInputsDecode size   5 | 1409.41 | 1114.16 | 1.26
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Added comments.  Streamlined flow for decode.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6155:

> 6153:   __ subl(output_size, length);
> 6154:   __ movq(rax, -1);
> 6155:   __ shrxq(rax, rax, output_size);// Input mask in rax

I think this could also be implemented as:
__ movq(rax, -1);
__ bzhiq(rax, rax, length);

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6173:

> 6171:   __ movq(rax, 64);
> 6172:   __ subq(rax, output_size);
> 6173:   __ shrxq(output_mask, output_mask, rax);

The output mask can also be computed using bzhiq:
__ movq(output_mask, -1);
__ bzhiq(output_mask, output_mask, output_size);

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6243:

> 6241: 
> 6242:   __ BIND(L_padding);
> 6243:   __ decrementq(r13, 1);

It will be good to use output_size here instead of r13.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6249:

> 6247:   __ jcc(Assembler::notEqual, L_donePadding);
> 6248: 
> 6249:   __ decrementq(r13, 1);

It will be good to use

Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v5]

2021-06-18 Thread Sandhya Viswanathan
On Fri, 18 Jun 2021 22:12:11 GMT, Scott Gibbons 
 wrote:

>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. 
>> Also allows for performance improvement for non-AVX-512 enabled platforms. 
>> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to 
>> accept an additional parameter (isMIME) for fast-path MIME decoding.
>> 
>> A change was made to the signature of DecodeBlock in Base64.java to provide 
>> the intrinsic information as to whether MIME decoding was being done.  This 
>> allows for the intrinsic to bypass the expensive setup of zmm registers from 
>> AVX tables, knowing there may be invalid Base64 characters every 76 
>> characters or so.  A change was also made here removing the restriction that 
>> the intrinsic must return an even multiple of 3 bytes decoded.  This 
>> implementation handles the pad characters at the end of the string and will 
>> return the actual number of characters decoded.
>> 
>> The AVX portion of this code will decode in blocks of 256 bytes per loop 
>> iteration, then in chunks of 64 bytes, followed by end fixup decoding.  The 
>> non-AVX code is an assembly-optimized version of the java DecodeBlock and 
>> behaves identically.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance 
>> by an average of 2.6x with a maximum 19.7x for buffers > ~20k.  The numbers 
>> are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using 
>> this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Decode size 1 | 15.36 | 15.32 | 1.00
>> testBase64Decode size 3 | 17.00 | 16.72 | 1.02
>> testBase64Decode size 7 | 20.60 | 18.82 | 1.09
>> testBase64Decode size 32 | 34.21 | 26.77 | 1.28
>> testBase64Decode size 64 | 54.43 | 38.35 | 1.42
>> testBase64Decode size 80 | 66.40 | 48.34 | 1.37
>> testBase64Decode size 96 | 73.16 | 52.90 | 1.38
>> testBase64Decode size 112 | 84.93 | 51.82 | 1.64
>> testBase64Decode size 512 | 288.81 | 32.04 | 9.01
>> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74
>> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72
>> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15
>> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07
>> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10
>> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02
>> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10
>> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05
>> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00
>> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05
>> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20
>> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09
>> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12
>> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09
>> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21
>> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29
>> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12
>> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05
>> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18
>> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02
>> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24
>> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23
>> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24
>> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14
>> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19
>> testBase64WithErrorInputsDecode size   2 | 1398.44 | 1138.17 | 1.23
>> testBase64WithErrorInputsDecode size   5 | 1409.41 | 1114.16 | 1.26
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Added comments.  Streamlined flow for decode.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6004:

> 6002:   __ BIND(L_continue);
> 6003: 
> 6004:   __ vpxor(errorvec, errorvec, errorvec, Assembler::AVX_512bit);

Why clearing errorvec is needed here?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6023:

> 6021:   __ evmovdquq(tmp16_op3, pack16_op, Assembler::AVX_512bit);
> 6022:   __ evmovdquq(tmp16_op2, pack16_op, Assembler::AVX_512bit);
> 6023:   __ evmovdquq(tmp16_op1, pack16_op, Assembler::AVX_512bit);

Why do we need 3 additional copies of pack16_op?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6026:

> 6024:   __ evmovdquq(tmp32_op3, pack32_op, Assembler::AVX_512bit);
> 6025:   __ evmovdquq(tmp32_op2, pack32_op, Assembler::AVX_512bit);
> 6026:   __ evmovdquq(tmp32_op1, pack32_op, Assembler::AVX_512bit);

Why do we need 3 additional copies of pack32_op?

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 6051:

> 6049:   __ vpternlogd(t0, 0xfe, input1, input2, Ass

Re: RFR: 8268276: Base64 Decoding optimization for x86 using AVX-512 [v3]

2021-06-08 Thread Sandhya Viswanathan
On Tue, 8 Jun 2021 00:30:38 GMT, Scott Gibbons 
 wrote:

>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. 
>> Also allows for performance improvement for non-AVX-512 enabled platforms. 
>> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to 
>> accept an additional parameter (isMIME) for fast-path MIME decoding.
>> 
>> A change was made to the signature of DecodeBlock in Base64.java to provide 
>> the intrinsic information as to whether MIME decoding was being done.  This 
>> allows for the intrinsic to bypass the expensive setup of zmm registers from 
>> AVX tables, knowing there may be invalid Base64 characters every 76 
>> characters or so.  A change was also made here removing the restriction that 
>> the intrinsic must return an even multiple of 3 bytes decoded.  This 
>> implementation handles the pad characters at the end of the string and will 
>> return the actual number of characters decoded.
>> 
>> The AVX portion of this code will decode in blocks of 256 bytes per loop 
>> iteration, then in chunks of 64 bytes, followed by end fixup decoding.  The 
>> non-AVX code is an assembly-optimized version of the java DecodeBlock and 
>> behaves identically.
>> 
>> Running the Base64Decode benchmark, this change increases decode performance 
>> by an average of 2.6x with a maximum 19.7x for buffers > ~20k.  The numbers 
>> are given in the table below.
>> 
>> **Base Score** is without intrinsic support, **Optimized Score** is using 
>> this intrinsic, and **Gain** is **Base** / **Optimized**.
>> 
>> 
>> Benchmark Name | Base Score | Optimized Score | Gain
>> -- | -- | -- | --
>> testBase64Decode size 1 | 15.36 | 15.32 | 1.00
>> testBase64Decode size 3 | 17.00 | 16.72 | 1.02
>> testBase64Decode size 7 | 20.60 | 18.82 | 1.09
>> testBase64Decode size 32 | 34.21 | 26.77 | 1.28
>> testBase64Decode size 64 | 54.43 | 38.35 | 1.42
>> testBase64Decode size 80 | 66.40 | 48.34 | 1.37
>> testBase64Decode size 96 | 73.16 | 52.90 | 1.38
>> testBase64Decode size 112 | 84.93 | 51.82 | 1.64
>> testBase64Decode size 512 | 288.81 | 32.04 | 9.01
>> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74
>> testBase64Decode size 2 | 9530.28 | 483.37 | 19.72
>> testBase64Decode size 5 | 24552.24 | 1735.07 | 14.15
>> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07
>> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10
>> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02
>> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10
>> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05
>> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00
>> testBase64MIMEDecode size 96 | 364.00 | 346.66 | 1.05
>> testBase64MIMEDecode size 112 | 472.88 | 394.78 | 1.20
>> testBase64MIMEDecode size 512 | 1814.96 | 1671.28 | 1.09
>> testBase64MIMEDecode size 1000 | 3623.50 | 3227.61 | 1.12
>> testBase64MIMEDecode size 2 | 70484.09 | 64940.77 | 1.09
>> testBase64MIMEDecode size 5 | 191732.34 | 158158.95 | 1.21
>> testBase64WithErrorInputsDecode size 1 | 1531.02 | 1185.19 | 1.29
>> testBase64WithErrorInputsDecode size 3 | 1306.59 | 1170.99 | 1.12
>> testBase64WithErrorInputsDecode size 7 | 1238.11 | 1176.62 | 1.05
>> testBase64WithErrorInputsDecode size 32 | 1346.46 | 1138.47 | 1.18
>> testBase64WithErrorInputsDecode size 64 | 1195.28 | 1172.52 | 1.02
>> testBase64WithErrorInputsDecode size 80 | 1469.00 | 1180.94 | 1.24
>> testBase64WithErrorInputsDecode size 96 | 1434.48 | 1167.74 | 1.23
>> testBase64WithErrorInputsDecode size 112 | 1440.06 | 1162.56 | 1.24
>> testBase64WithErrorInputsDecode size 512 | 1362.79 | 1193.42 | 1.14
>> testBase64WithErrorInputsDecode size 1000 | 1426.07 | 1194.44 | 1.19
>> testBase64WithErrorInputsDecode size   2 | 1398.44 | 1138.17 | 1.23
>> testBase64WithErrorInputsDecode size   5 | 1409.41 | 1114.16 | 1.26
>
> Scott Gibbons has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Fixing review comments.  Adding notes about isMIME parameter for other 
> architectures; clarifying decodeBlock comments.

@asgibbons Thanks a lot for contributing this. The performance gain is 
impressive. I have some minor comments. Please take a look.

src/hotspot/cpu/x86/assembler_x86.cpp line 4555:

> 4553: void Assembler::evpmaddubsw(XMMRegister dst, XMMRegister src1, 
> XMMRegister src2, int vector_len) {
> 4554:   assert(VM_Version::supports_avx512bw(), "");
> 4555:   InstructionAttr attributes(vector_len, /* rex_w */ false, /* 
> legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ true);

This instruction is also supported on AVX platforms. The assert check could be 
as follows:
  assert(vector_len == AVX_128bit? VM_Version::supports_avx() :
 vector_len == AVX_256bit? VM_Version::supports_avx2() :
 vector_len == AVX_512bit? VM_Version::supports_avx512bw() : 0, "");
Accordingly the instruction could be named as vpmaddubsw.

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp l

Integrated: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics

2021-06-03 Thread Sandhya Viswanathan
On Thu, 22 Apr 2021 19:07:28 GMT, Sandhya Viswanathan 
 wrote:

> This PR contains Short Vector Math Library support related changes for 
> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
> in preparation for when targeted.
> 
> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
> assembly provide optimized implementation for Vector API transcendental and 
> trigonometric methods.
> These methods are built into a separate library instead of being part of 
> libjvm.so or jvm.dll.
> 
> The following changes are made:
>The source for these methods is placed in the jdk.incubator.vector module 
> under src/jdk.incubator.vector/linux/native/libsvml and 
> src/jdk.incubator.vector/windows/native/libsvml.
>The assembly source files are named as “*.S” and include files are named 
> as “*.S.inc”.
>The corresponding build script is placed at 
> make/modules/jdk.incubator.vector/Lib.gmk.
>Changes are made to build system to support dependency tracking for 
> assembly files with includes.
>The built native libraries (libsvml.so/svml.dll) are placed in bin 
> directory of JDK on Windows and lib directory of JDK on Linux.
>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
> optimized methods from this library.
> 
> Build system changes and module library build scripts are contributed by 
> Magnus (magnus.ihse.bur...@oracle.com).
> 
> Looking forward to your review and feedback.
> 
> Performance:
> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
> Double128Vector.COS 49.94 245.89 ops/ms 4.92
> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
> Double256Vector.COS 58.26 389.77 ops/ms 6.69
> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
> Double512Vector.COS 59.88 837.04 ops/ms 13.98
> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
> Double512Vector.POW 37.42 384.13 ops/ms 10.26
> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
> Double64Vector.COS 23.42 152.01 ops/ms 6.49
> Double64Vector.COSH 17.34 113.34 ops/ms 6.54
> Double64Vector.EXP 27.08 203.53 ops/ms 7.52
> Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
> Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
> Double64Vector.LOG 26.75 142.63 ops/ms 5.33
> Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
> Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
> Double64Vector.SIN 23.28 146.

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v17]

2021-06-03 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request with a new target base due to 
a merge or a rebase. The pull request now contains 21 commits:

 - Merge master
 - update javadoc
 - correct javadoc
 - Javadoc changes
 - correct ppc.ad
 - Merge master
 - Commit missing changes
 - Implement Vladimir Ivanov and Paul Sandoz review comments
 - fix 32-bit build
 - Add comments explaining naming convention
 - ... and 11 more: https://git.openjdk.java.net/jdk/compare/52d8215a...03ac3197

-

Changes: https://git.openjdk.java.net/jdk/pull/3638/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=16
  Stats: 416073 lines in 119 files changed: 415886 ins; 124 del; 63 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v16]

2021-06-02 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  update javadoc

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/e5208a18..b229e8b4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=15
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=14-15

  Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v15]

2021-05-25 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  correct javadoc

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/6cd50248..e5208a18

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=14
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=13-14

  Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v14]

2021-05-25 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Javadoc changes

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/4d59af0a..6cd50248

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=13
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=12-13

  Stats: 58 lines in 1 file changed: 38 ins; 0 del; 20 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v13]

2021-05-19 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  correct ppc.ad

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/7b959b67..4d59af0a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=12
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=11-12

  Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v12]

2021-05-19 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request with a new target base due to 
a merge or a rebase. The pull request now contains 16 commits:

 - Merge master
 - Commit missing changes
 - Implement Vladimir Ivanov and Paul Sandoz review comments
 - fix 32-bit build
 - Add comments explaining naming convention
 - jcheck fixes
 - Print intrinsic fix
 - Implement review comments
 - Add missing Lib.gmk
 - Merge master
 - ... and 6 more: https://git.openjdk.java.net/jdk/compare/b961f253...7b959b67

-

Changes: https://git.openjdk.java.net/jdk/pull/3638/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=11
  Stats: 416021 lines in 119 files changed: 415854 ins; 124 del; 43 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v11]

2021-05-19 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Commit missing changes

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/0b4a1c9c..1b0367ac

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=10
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=09-10

  Stats: 55 lines in 16 files changed: 2 ins; 42 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-19 Thread Sandhya Viswanathan
On Wed, 19 May 2021 22:02:14 GMT, Paul Sandoz  wrote:

>> Tier 1 to 3 tests pass for the default set of build profiles.
>
>> Thanks a lot for the review @PaulSandoz @iwanowww @erikj79.
>> Paul and Vladimir, I have implemented your review comments. Please take a 
>> look.
> 
> `case VECTOR_OP_OR` is still present.

@PaulSandoz Thanks for pointing that out. I had missed git add for some of the 
files.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-19 Thread Sandhya Viswanathan
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz  wrote:

>> Sandhya Viswanathan has updated the pull request with a new target base due 
>> to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge master
>>  - remove whitespace
>>  - Merge master
>>  - Small fix
>>  - cleanup
>>  - x86 short vector math optimization for Vector API
>
> Tier 1 to 3 tests pass for the default set of build profiles.

Thanks a lot for the review @PaulSandoz @iwanowww @erikj79.
Paul and Vladimir, I have implemented your review comments. Please take a look.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v10]

2021-05-19 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Implement Vladimir Ivanov and Paul Sandoz review comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/f7e39913..0b4a1c9c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=09
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=08-09

  Stats: 45 lines in 1 file changed: 0 ins; 45 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v9]

2021-05-18 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  fix 32-bit build

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/45f20a34..f7e39913

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=07-08

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]

2021-05-18 Thread Sandhya Viswanathan
On Wed, 19 May 2021 00:58:15 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]

2021-05-18 Thread Sandhya Viswanathan
On Wed, 19 May 2021 00:26:48 GMT, Vladimir Kozlov  wrote:

>> Sandhya Viswanathan has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   jcheck fixes
>
> This is much much better! Thank you for changing it. I am only asking now to 
> add comment explaining names.

@vnkozlov I have added comments explaining naming convention. Please let me 
know if this looks ok.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v8]

2021-05-18 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Add comments explaining naming convention

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/0d1d0382..45f20a34

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=06-07

  Stats: 15 lines in 1 file changed: 15 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v7]

2021-05-18 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  jcheck fixes

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/11528426..0d1d0382

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=05-06

  Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]

2021-05-18 Thread Sandhya Viswanathan
On Tue, 18 May 2021 23:43:13 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v6]

2021-05-18 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Print intrinsic fix

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/9021a15c..11528426

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=04-05

  Stats: 9 lines in 1 file changed: 2 ins; 0 del; 7 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v5]

2021-05-18 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Implement review comments

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/01a549e4..9021a15c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=03-04

  Stats: 1220 lines in 8 files changed: 48 ins; 1104 del; 68 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v4]

2021-05-14 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request incrementally with one 
additional commit since the last revision:

  Add missing Lib.gmk

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3638/files
  - new: https://git.openjdk.java.net/jdk/pull/3638/files/6e105f51..01a549e4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=02-03

  Stats: 42 lines in 1 file changed: 42 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v3]

2021-05-14 Thread Sandhya Viswanathan
59 ops/ms 5.03
> Double64Vector.TAN 21.00 86.43 ops/ms 4.12
> Double64Vector.TANH 23.75 111.35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request with a new target base due to 
a merge or a rebase. The pull request now contains seven commits:

 - Merge master
 - Merge master
 - remove whitespace
 - Merge master
 - Small fix
 - cleanup
 - x86 short vector math optimization for Vector API

-

Changes: https://git.openjdk.java.net/jdk/pull/3638/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=02
  Stats: 417101 lines in 120 files changed: 416935 ins; 123 del; 43 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-04 Thread Sandhya Viswanathan
On Wed, 28 Apr 2021 21:11:26 GMT, Sandhya Viswanathan 
 wrote:

>> This PR contains Short Vector Math Library support related changes for 
>> [JEP-414 Vector API (Second Incubator)](https://openjdk.java.net/jeps/414), 
>> in preparation for when targeted.
>> 
>> Intel Short Vector Math Library (SVML) based intrinsics in native x86 
>> assembly provide optimized implementation for Vector API transcendental and 
>> trigonometric methods.
>> These methods are built into a separate library instead of being part of 
>> libjvm.so or jvm.dll.
>> 
>> The following changes are made:
>>The source for these methods is placed in the jdk.incubator.vector module 
>> under src/jdk.incubator.vector/linux/native/libsvml and 
>> src/jdk.incubator.vector/windows/native/libsvml.
>>The assembly source files are named as “*.S” and include files are named 
>> as “*.S.inc”.
>>The corresponding build script is placed at 
>> make/modules/jdk.incubator.vector/Lib.gmk.
>>Changes are made to build system to support dependency tracking for 
>> assembly files with includes.
>>The built native libraries (libsvml.so/svml.dll) are placed in bin 
>> directory of JDK on Windows and lib directory of JDK on Linux.
>>The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
>> optimized methods from this library.
>> 
>> Build system changes and module library build scripts are contributed by 
>> Magnus (magnus.ihse.bur...@oracle.com).
>> 
>> Looking forward to your review and feedback.
>> 
>> Performance:
>> Micro benchmark Base Optimized Unit Gain(Optimized/Base)
>> Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
>> Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
>> Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
>> Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
>> Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
>> Double128Vector.COS 49.94 245.89 ops/ms 4.92
>> Double128Vector.COSH 26.91 126.00 ops/ms 4.68
>> Double128Vector.EXP 71.64 379.65 ops/ms 5.30
>> Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
>> Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
>> Double128Vector.LOG 61.95 279.84 ops/ms 4.52
>> Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
>> Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
>> Double128Vector.SIN 49.36 240.79 ops/ms 4.88
>> Double128Vector.SINH 26.59 103.75 ops/ms 3.90
>> Double128Vector.TAN 41.05 152.39 ops/ms 3.71
>> Double128Vector.TANH 45.29 169.53 ops/ms 3.74
>> Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
>> Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
>> Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
>> Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
>> Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
>> Double256Vector.COS 58.26 389.77 ops/ms 6.69
>> Double256Vector.COSH 29.44 151.11 ops/ms 5.13
>> Double256Vector.EXP 86.67 564.68 ops/ms 6.52
>> Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
>> Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
>> Double256Vector.LOG 71.52 394.90 ops/ms 5.52
>> Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
>> Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
>> Double256Vector.SIN 57.06 380.98 ops/ms 6.68
>> Double256Vector.SINH 29.40 117.37 ops/ms 3.99
>> Double256Vector.TAN 44.90 279.90 ops/ms 6.23
>> Double256Vector.TANH 54.08 274.71 ops/ms 5.08
>> Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
>> Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
>> Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
>> Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
>> Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
>> Double512Vector.COS 59.88 837.04 ops/ms 13.98
>> Double512Vector.COSH 30.34 172.76 ops/ms 5.70
>> Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
>> Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
>> Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
>> Double512Vector.LOG 74.84 996.00 ops/ms 13.31
>> Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
>> Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
>> Double512Vector.POW 37.42 384.13 ops/ms 10.26
>> Double512Vector.SIN 59.74 728.45 ops/ms 12.19
>> Double512Vector.SINH 29.47 143.38 ops/ms 4.87
>> Double512Vector.TAN 46.20 587.21 ops/ms 12.71
>> Double512Vector.TANH 57.36 495.42 ops/ms 8.64
>> Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
>> Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
>> Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
>> Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
>> Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
>> Double64Vector.COS 23.42 152.01 ops/ms 6.49
>> Double64Vector

Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-05-03 Thread Sandhya Viswanathan
On Mon, 3 May 2021 21:41:26 GMT, Paul Sandoz  wrote:

>> Sandhya Viswanathan has updated the pull request with a new target base due 
>> to a merge or a rebase. The pull request now contains six commits:
>> 
>>  - Merge master
>>  - remove whitespace
>>  - Merge master
>>  - Small fix
>>  - cleanup
>>  - x86 short vector math optimization for Vector API
>
> Tier 1 to 3 tests pass for the default set of build profiles.

@PaulSandoz Thanks a lot for running through the tests.

-

PR: https://git.openjdk.java.net/jdk/pull/3638


Re: RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics [v2]

2021-04-28 Thread Sandhya Viswanathan
35 ops/ms 4.69
> Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
> Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
> Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
> Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
> Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
> Float128Vector.COS 42.82 803.02 ops/ms 18.75
> Float128Vector.COSH 31.44 118.34 ops/ms 3.76
> Float128Vector.EXP 72.43 855.33 ops/ms 11.81
> Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
> Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
> Float128Vector.LOG 52.95 877.94 ops/ms 16.58
> Float128Vector.LOG10 49.26 603.72 ops/ms 12.26
> Float128Vector.LOG1P 20.89 430.59 ops/ms 20.61
> Float128Vector.SIN 43.38 745.31 ops/ms 17.18
> Float128Vector.SINH 31.11 112.91 ops/ms 3.63
> Float128Vector.TAN 37.25 332.13 ops/ms 8.92
> Float128Vector.TANH 57.63 453.77 ops/ms 7.87
> Float256Vector.ACOS 65.23 123.73 ops/ms 1.90
> Float256Vector.ASIN 63.41 132.86 ops/ms 2.10
> Float256Vector.ATAN 23.51 649.02 ops/ms 27.61
> Float256Vector.ATAN2 18.19 455.95 ops/ms 25.07
> Float256Vector.CBRT 45.99 594.81 ops/ms 12.93
> Float256Vector.COS 43.75 926.69 ops/ms 21.18
> Float256Vector.COSH 33.52 130.46 ops/ms 3.89
> Float256Vector.EXP 75.70 1366.72 ops/ms 18.05
> Float256Vector.EXPM1 39.00 149.72 ops/ms 3.84
> Float256Vector.HYPOT 52.91 1023.18 ops/ms 19.34
> Float256Vector.LOG 53.31 1545.77 ops/ms 29.00
> Float256Vector.LOG10 50.31 863.80 ops/ms 17.17
> Float256Vector.LOG1P 21.51 616.59 ops/ms 28.66
> Float256Vector.SIN 44.07 911.04 ops/ms 20.67
> Float256Vector.SINH 33.16 122.50 ops/ms 3.69
> Float256Vector.TAN 37.85 497.75 ops/ms 13.15
> Float256Vector.TANH 64.27 537.20 ops/ms 8.36
> Float512Vector.ACOS 67.33 1718.00 ops/ms 25.52
> Float512Vector.ASIN 66.12 1780.85 ops/ms 26.93
> Float512Vector.ATAN 22.63 1780.31 ops/ms 78.69
> Float512Vector.ATAN2 17.52 1113.93 ops/ms 63.57
> Float512Vector.CBRT 54.78 2087.58 ops/ms 38.11
> Float512Vector.COS 40.92 1567.93 ops/ms 38.32
> Float512Vector.COSH 33.42 138.36 ops/ms 4.14
> Float512Vector.EXP 70.51 3835.97 ops/ms 54.41
> Float512Vector.EXPM1 38.06 279.80 ops/ms 7.35
> Float512Vector.HYPOT 50.99 3287.55 ops/ms 64.47
> Float512Vector.LOG 49.61 3156.99 ops/ms 63.64
> Float512Vector.LOG10 46.94 2489.16 ops/ms 53.02
> Float512Vector.LOG1P 20.66 1689.86 ops/ms 81.81
> Float512Vector.POW 32.73 1015.85 ops/ms 31.04
> Float512Vector.SIN 41.17 1587.71 ops/ms 38.56
> Float512Vector.SINH 33.05 129.39 ops/ms 3.91
> Float512Vector.TAN 35.60 1336.11 ops/ms 37.53
> Float512Vector.TANH 65.77 2295.28 ops/ms 34.90
> Float64Vector.ACOS 48.41 89.34 ops/ms 1.85
> Float64Vector.ASIN 47.30 95.72 ops/ms 2.02
> Float64Vector.ATAN 20.62 49.45 ops/ms 2.40
> Float64Vector.ATAN2 15.95 112.35 ops/ms 7.04
> Float64Vector.CBRT 24.03 134.57 ops/ms 5.60
> Float64Vector.COS 44.28 394.33 ops/ms 8.91
> Float64Vector.COSH 28.35 95.27 ops/ms 3.36
> Float64Vector.EXP 65.80 486.37 ops/ms 7.39
> Float64Vector.EXPM1 34.61 85.99 ops/ms 2.48
> Float64Vector.HYPOT 50.40 147.82 ops/ms 2.93
> Float64Vector.LOG 51.93 163.25 ops/ms 3.14
> Float64Vector.LOG10 49.53 147.98 ops/ms 2.99
> Float64Vector.LOG1P 19.20 206.81 ops/ms 10.77
> Float64Vector.SIN 44.41 382.09 ops/ms 8.60
> Float64Vector.SINH 28.20 90.68 ops/ms 3.22
> Float64Vector.TAN 36.29 160.89 ops/ms 4.43
> Float64Vector.TANH 47.65 214.04 ops/ms 4.49

Sandhya Viswanathan has updated the pull request with a new target base due to 
a merge or a rebase. The pull request now contains six commits:

 - Merge master
 - remove whitespace
 - Merge master
 - Small fix
 - cleanup
 - x86 short vector math optimization for Vector API

-

Changes: https://git.openjdk.java.net/jdk/pull/3638/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3638&range=01
  Stats: 417102 lines in 120 files changed: 416935 ins; 123 del; 44 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3638.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3638/head:pull/3638

PR: https://git.openjdk.java.net/jdk/pull/3638


RFR: 8265783: Create a separate library for x86 Intel SVML assembly intrinsics

2021-04-22 Thread Sandhya Viswanathan
Intel Short Vector Math Library (SVML) based intrinsics in native x86 assembly 
provide optimized implementation for Vector API transcendental and 
trigonometric methods.
These methods are built into a separate library instead of being part of 
libjvm.so or jvm.dll.

The following changes are made:
   The source for these methods is placed in the jdk.incubator.vector module 
under src/jdk.incubator.vector/linux/native/libsvml and 
src/jdk.incubator.vector/windows/native/libsvml.
   The assembly source files are named as “*.S” and include files are named as 
“*.S.inc”.
   The corresponding build script is placed at 
make/modules/jdk.incubator.vector/Lib.gmk.
   Changes are made to build system to support dependency tracking for assembly 
files with includes.
   The built native libraries (libsvml.so/svml.dll) are placed in bin directory 
of JDK on Windows and lib directory of JDK on Linux.
   The C2 JIT uses the dll_load and dll_lookup to get the addresses of 
optimized methods from this library.

Build system changes and module library build scripts are contributed by Magnus 
(magnus.ihse.bur...@oracle.com).

This work is part of second round of incubation of the Vector API.
JEP: https://bugs.openjdk.java.net/browse/JDK-8261663

Please review.

Performance:
Micro benchmark Base Optimized Unit Gain(Optimized/Base)
Double128Vector.ACOS 45.91 87.34 ops/ms 1.90
Double128Vector.ASIN 45.06 92.36 ops/ms 2.05
Double128Vector.ATAN 19.92 118.36 ops/ms 5.94
Double128Vector.ATAN2 15.24 88.17 ops/ms 5.79
Double128Vector.CBRT 45.77 208.36 ops/ms 4.55
Double128Vector.COS 49.94 245.89 ops/ms 4.92
Double128Vector.COSH 26.91 126.00 ops/ms 4.68
Double128Vector.EXP 71.64 379.65 ops/ms 5.30
Double128Vector.EXPM1 35.95 150.37 ops/ms 4.18
Double128Vector.HYPOT 50.67 174.10 ops/ms 3.44
Double128Vector.LOG 61.95 279.84 ops/ms 4.52
Double128Vector.LOG10 59.34 239.05 ops/ms 4.03
Double128Vector.LOG1P 18.56 200.32 ops/ms 10.79
Double128Vector.SIN 49.36 240.79 ops/ms 4.88
Double128Vector.SINH 26.59 103.75 ops/ms 3.90
Double128Vector.TAN 41.05 152.39 ops/ms 3.71
Double128Vector.TANH 45.29 169.53 ops/ms 3.74
Double256Vector.ACOS 54.21 106.39 ops/ms 1.96
Double256Vector.ASIN 53.60 107.99 ops/ms 2.01
Double256Vector.ATAN 21.53 189.11 ops/ms 8.78
Double256Vector.ATAN2 16.67 140.76 ops/ms 8.44
Double256Vector.CBRT 56.45 397.13 ops/ms 7.04
Double256Vector.COS 58.26 389.77 ops/ms 6.69
Double256Vector.COSH 29.44 151.11 ops/ms 5.13
Double256Vector.EXP 86.67 564.68 ops/ms 6.52
Double256Vector.EXPM1 41.96 201.28 ops/ms 4.80
Double256Vector.HYPOT 66.18 305.74 ops/ms 4.62
Double256Vector.LOG 71.52 394.90 ops/ms 5.52
Double256Vector.LOG10 65.43 362.32 ops/ms 5.54
Double256Vector.LOG1P 19.99 300.88 ops/ms 15.05
Double256Vector.SIN 57.06 380.98 ops/ms 6.68
Double256Vector.SINH 29.40 117.37 ops/ms 3.99
Double256Vector.TAN 44.90 279.90 ops/ms 6.23
Double256Vector.TANH 54.08 274.71 ops/ms 5.08
Double512Vector.ACOS 55.65 687.54 ops/ms 12.35
Double512Vector.ASIN 57.31 777.72 ops/ms 13.57
Double512Vector.ATAN 21.42 729.21 ops/ms 34.04
Double512Vector.ATAN2 16.37 414.33 ops/ms 25.32
Double512Vector.CBRT 56.78 834.38 ops/ms 14.69
Double512Vector.COS 59.88 837.04 ops/ms 13.98
Double512Vector.COSH 30.34 172.76 ops/ms 5.70
Double512Vector.EXP 99.66 1608.12 ops/ms 16.14
Double512Vector.EXPM1 43.39 318.61 ops/ms 7.34
Double512Vector.HYPOT 73.87 1502.72 ops/ms 20.34
Double512Vector.LOG 74.84 996.00 ops/ms 13.31
Double512Vector.LOG10 71.12 1046.52 ops/ms 14.72
Double512Vector.LOG1P 19.75 776.87 ops/ms 39.34
Double512Vector.POW 37.42 384.13 ops/ms 10.26
Double512Vector.SIN 59.74 728.45 ops/ms 12.19
Double512Vector.SINH 29.47 143.38 ops/ms 4.87
Double512Vector.TAN 46.20 587.21 ops/ms 12.71
Double512Vector.TANH 57.36 495.42 ops/ms 8.64
Double64Vector.ACOS 24.04 73.67 ops/ms 3.06
Double64Vector.ASIN 23.78 75.11 ops/ms 3.16
Double64Vector.ATAN 14.14 62.81 ops/ms 4.44
Double64Vector.ATAN2 10.38 44.43 ops/ms 4.28
Double64Vector.CBRT 16.47 107.50 ops/ms 6.53
Double64Vector.COS 23.42 152.01 ops/ms 6.49
Double64Vector.COSH 17.34 113.34 ops/ms 6.54
Double64Vector.EXP 27.08 203.53 ops/ms 7.52
Double64Vector.EXPM1 18.77 96.73 ops/ms 5.15
Double64Vector.HYPOT 18.54 103.62 ops/ms 5.59
Double64Vector.LOG 26.75 142.63 ops/ms 5.33
Double64Vector.LOG10 25.85 139.71 ops/ms 5.40
Double64Vector.LOG1P 13.26 97.94 ops/ms 7.38
Double64Vector.SIN 23.28 146.91 ops/ms 6.31
Double64Vector.SINH 17.62 88.59 ops/ms 5.03
Double64Vector.TAN 21.00 86.43 ops/ms 4.12
Double64Vector.TANH 23.75 111.35 ops/ms 4.69
Float128Vector.ACOS 57.52 110.65 ops/ms 1.92
Float128Vector.ASIN 57.15 117.95 ops/ms 2.06
Float128Vector.ATAN 22.52 318.74 ops/ms 14.15
Float128Vector.ATAN2 17.06 246.07 ops/ms 14.42
Float128Vector.CBRT 29.72 443.74 ops/ms 14.93
Float128Vector.COS 42.82 803.02 ops/ms 18.75
Float128Vector.COSH 31.44 118.34 ops/ms 3.76
Float128Vector.EXP 72.43 855.33 ops/ms 11.81
Float128Vector.EXPM1 37.82 127.85 ops/ms 3.38
Float128Vector.HYPOT 53.20 591.68 ops/ms 11.12
Float128Vector.LOG 52.95 877.94 ops/ms 16.5

Re: RFR: 8223347: Integration of Vector API (Incubator) [v4]

2020-10-13 Thread Sandhya Viswanathan
On Tue, 13 Oct 2020 21:29:52 GMT, Ekaterina Pavlova 
 wrote:

>> Build changes look good.
>
> There are several gc tests crashed in panama-vector tier3 testing which seems 
> are not observed in openjdk repo.
> The crashes look like:
> #  assert(oopDesc::is_oop(obj)) failed: not an oop: 0xfff1
> #
> # JRE version: Java(TM) SE Runtime Environment (16.0+3) (fastdebug build 
> 16-panama+3-216)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 16-panama+3-216, 
> mixed mode, sharing, tiered, compressed oops,
> g1 gc, linux-amd64) # Problematic frame:
> # V  [libjvm.so+0xd8ef94]  HandleArea::allocate_handle(oop)+0x144
> 
> and the issue is actually tracked by JDK-8233199.
> 
> This issue needs to be at least analyzed before integrating Vector API.

@katyapav Is the failure observed on vector-unstable branch of panama-vector?
The code in this pull request is from vector-unstable branch.
The bug report https://bugs.openjdk.java.net/browse/JDK-8233199 refers to 
repo-valhalla and not
panama-vector:vector-unstable. @PaulSandoz is doing final testing of the pull 
request today before integration tomorrow
hopefully.

-

PR: https://git.openjdk.java.net/jdk/pull/367