RE: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

2020-01-08 Thread Kamath, Smita
Hi Alan,

Thanks for the comment. I added these checks for caution but on further 
analysis it appears that these checks are not needed.

1)  For PrimitiveRightShift, we have the input array and its length passed as 
arguments. Since the input array, result[], is copied using arraycopy, we know 
that IOOBE will be thrown.  Therefore, it is safe to remove the checkIndex 
condition. 
 int result[] = new int[nInts+len+1]; 
 System.arraycopy(a, 0, result, 0, len);
 primitiveRightShift(result, result.length, 32 - nBits);

2) In case of PrimitiveLeftShift, I checked the call stack and verified that 
the length passed to the method is same as the length of the array. Therefore, 
the check is not necessary in this scenario.

3) In case of shiftLeft and shiftRight,  the number of iterations (numIter) is 
computed from the mag.length. The newMag array length is also computed and 
adjusted based on mag.length. 

If these changes look fine, I can send an updated webrev removing these checks. 
Please let me know.  

Thanks and Regards,
Smita Kamath

-Original Message-
From: Alan Bateman  
Sent: Saturday, December 28, 2019 12:22 AM
To: Vladimir Kozlov ; Kamath, Smita 

Cc: core-libs-dev@openjdk.java.net; 'hotspot compiler' 

Subject: Re: RFR(M):8167065: Add intrinsic support for double precision 
shifting on x86_64

On 20/12/2019 22:19, Vladimir Kozlov wrote:
> We should have added core-libs to review since you modified 
> BigInteger.java.
>
This adds Objects.checkFromToIndex checks in the middle of several supporting 
methods. Is IOOBE really possible in these cases or are these stand in for 
always-on asserts to ensure the instrinic is never used when the preconditions 
aren't satisfied?

-Alan


RE: RFR(M):8167065: Add intrinsic support for double precision shifting on x86_64

2019-12-23 Thread Kamath, Smita
Hi Vladimir, 

Thanks for reviewing the code. Can you please sponsor and push the changes? 

Regards,
Smita

-Original Message-
From: Vladimir Kozlov  
Sent: Friday, December 20, 2019 9:12 PM
To: Kamath, Smita 
Cc: 'hotspot compiler' ; 
core-libs-dev@openjdk.java.net
Subject: Re: RFR(M):8167065: Add intrinsic support for double precision 
shifting on x86_64

Testing results are good after fixing the typo.

We should consider implementing this intrinsic in Graal too. We have to upload 
AOT and Graal test changes anyway.

Thanks,
Vladimir

On 12/20/19 3:47 PM, Vladimir Kozlov wrote:
> Hi Smita,
> 
> You have typo (should be supports_vbmi2):
> 
> src/hotspot/cpu/x86/assembler_x86.cpp:6547:22: error: 'support_vbmi2' is not 
> a member of 'VM_Version'
>      assert(VM_Version::support_vbmi2(), "requires vbmi2");
>     ^
> 
> Debug build failed. I am retesting with local fix.
> 
> Regards,
> Vladimir K
> 
> On 12/20/19 2:19 PM, Vladimir Kozlov wrote:
>> We should have added core-libs to review since you modified BigInteger.java.
>>
>> webrev02 looks good to me. Let me test it.
>>
>> Thanks,
>> Vladimir
>>
>> On 12/20/19 1:52 PM, Kamath, Smita wrote:
>>> Hi Vladimir,
>>>
>>> Thank you for reviewing the code. I have updated the code as per your 
>>> recommendations ( please look at the email below).
>>> Link to the updated webrev: 
>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev02/
>>>
>>> Regards,
>>> Smita
>>>
>>> -Original Message-
>>> From: Vladimir Kozlov 
>>> Sent: Thursday, December 19, 2019 5:17 PM
>>> To: Kamath, Smita 
>>> Cc: Viswanathan, Sandhya ; 'hotspot 
>>> compiler' 
>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double 
>>> precision shifting on x86_64
>>>
>>> We missed AOT and JVMCI (in HS) changes similar for Base64 [1] to record 
>>> StubRoutines pointers:
>>>
>>> StubRoutines::_bigIntegerRightShiftWorker
>>> StubRoutines::_bigIntegerLeftShiftWorker
>>> Smita>>>done
>>>
>>> In the test add an other @run command with default setting (without 
>>> -XX:-TieredCompilation -Xbatch).
>>> Smita>>>done
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> [1] http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.01/
>>>
>>> On 12/18/19 6:33 PM, Kamath, Smita wrote:
>>>> Hi Vladimir,
>>>>
>>>> I have made the code changes you suggested (please look at the email 
>>>> below).
>>>> I have also enabled the intrinsic to run only when VBMI2 feature is 
>>>> available.
>>>> The intrinsic shows gains of >1.5x above 4k bit BigInteger.
>>>>
>>>> Webrev link:
>>>> https://cr.openjdk.java.net/~svkamath/bigIntegerShift/webrev01/
>>>>
>>>> Thanks,
>>>> Smita
>>>>
>>>> -Original Message-
>>>> From: Vladimir Kozlov 
>>>> Sent: Wednesday, December 11, 2019 10:55 AM
>>>> To: Kamath, Smita ; 'hotspot compiler'
>>>> ; Viswanathan, Sandhya 
>>>> 
>>>> Subject: Re: RFR(M):8167065: Add intrinsic support for double 
>>>> precision shifting on x86_64
>>>>
>>>> Hi Kamath,
>>>>
>>>> First, general question. What performance you see when VBMI2 
>>>> instructions are *not* used with your new code vs code generated by C2.
>>>> What improvement you see when VBMI2 is used. This is to understand if we 
>>>> need only VBMI2 version of intrinsic or not.
>>>>
>>>> Second. Sandhya recently pushed 8235510 changes to rollback avx512 
>>>> code for CRC32 due to performance issues. Does you change has any issues 
>>>> on some Intel's CPU too? Should it be excluded on such CPUs?
>>>>
>>>> Third. I would suggest to wait after we fork JDK 14 with this 
>>>> changes. I think it may be too late for 14 because we would need test this 
>>>> including performance testing.
>>>>
>>>> In assembler_x86.cpp use supports_vbmi2() instead of UseVBMI2 in assert.
>>>> For that to work in vm_version_x86.cpp#l687 clear CPU_VBMI2 bit 
>>>> when UseAVX < 3 ( < avx512). You can also use
>>>> supports_vbmi2() instead of (UseAVX > 2 && UseVBMI2) in 
>>>> stubGenerator_x86_64.cpp combinations with that.
>>>> Smita >&g

RE: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

2018-11-20 Thread Kamath, Smita
Hi Tony,

Thanks for your comments. 

1)  This intrinsic is also used with solaris-sparc, has there been a sanity 
check by anyone to make sure this does not break the sparc intrinsic? It may 
work as the code is now since the sparc intrinsic will only use the first two 
longs in the subkeyHtbl. 
Would it be possible to help with this sanity check?  I don't have the required 
set-up to test this scenario. 

2) I have changed the code so that subkeyH corresponds to the first two entries 
of subkeyHtbl.  Please find the updated webrev link.

Bug link: https://bugs.openjdk.java.net/browse/JDK-8214074

Webrev link: http://cr.openjdk.java.net/~svkamath/ghash/webrev01/

Thanks and Regards,
Smita



-Original Message-
From: Anthony Scarpino [mailto:anthony.scarp...@oracle.com] 
Sent: Tuesday, November 20, 2018 2:05 PM
To: Kamath, Smita ; 'Vladimir Kozlov' 

Cc: Viswanathan, Sandhya ; 
core-libs-dev@openjdk.java.net; hotspot compiler 

Subject: Re: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

On 11/19/18 12:50 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> I'd like to contribute an optimization for GHASH Algorithm using AVX 
> Instructions. I have tested this optimization on SKX x86_64 platform 
> and it shows ~20-30% performance improvement for larger message sizes 
> (for example 8k).
> 
> I, smita.kam...@intel.com <mailto:smita.kam...@intel.com> , Shay 
> Gueuron, (shay.gue...@intel.com <mailto:shay.gue...@intel.com>)and 
> Regev Shemy (regev.sh...@intel.com <mailto:regev.sh...@intel.com>) are 
> contributors to this code.
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8214074
> 
> Link to webrev: http://cr.openjdk.java.net/~svkamath/ghash/webrev/
> 
> For testing the implementation, I have executed TestAESMain.java. I 
> have executed Jtreg tests and tested this code on 64 bit Windows and 
> Linux platforms.
> 
> Please review and let me know if you have any comments.
> 
> Thanks and Regards,
> 
> Smita
> 

Hi,

This intrinsic is also used with solaris-sparc, has there been a sanity check 
by anyone to make sure this does not break the sparc intrinsic? 
It may work as the code is now since the sparc intrinsic will only use the 
first two longs in the subkeyHtbl.

In that same idea, have you consider combining the subkeyH to be the first two 
of subkeyHtbl for the non-avx operations?  It would eliminate an extra two 
longs per instance.

Tony


RE: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

2018-11-20 Thread Kamath, Smita
Hi Bernd,

I agree to both of your comments and will update my code with the changes.

Thanks,
Smita

From: Bernd Eckenfels [mailto:e...@zusammenkunft.net]
Sent: Monday, November 19, 2018 2:27 PM
To: Kamath, Smita ; 'Vladimir Kozlov' 

Cc: core-libs-dev@openjdk.java.net; security-...@openjdk.java.net
Subject: Re: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

Hello,

What is the purpose of setting some of them to 0 twice? (It's a new array which 
should be all-0 anyway.)

+  for (int i = 1; i < 9 ; i++) {
+subkeyHtbl[2*i] = 0;
+subkeyHtbl[2*i+1] = 0;
+}

Also, is the subkeyH no longer be needed (or can be redesigned to use 
subkeyHtbl[0] and 1?

Gruss
Bernd
--
http://bernd.eckenfels.net


Von: core-libs-dev 
mailto:core-libs-dev-boun...@openjdk.java.net>>
 im Auftrag von Kamath, Smita 
mailto:smita.kam...@intel.com>>
Gesendet: Montag, November 19, 2018 10:52 PM
An: 'Vladimir Kozlov'
Cc: Anthony Scarpino; 
core-libs-dev@openjdk.java.net<mailto:core-libs-dev@openjdk.java.net>; hotspot 
compiler
Betreff: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

Hi Vladimir,

I'd like to contribute an optimization for GHASH Algorithm using AVX 
Instructions. I have tested this optimization on SKX x86_64 platform and it 
shows ~20-30% performance improvement for larger message sizes (for example 8k).

I, 
smita.kam...@intel.com<mailto:smita.kam...@intel.com<mailto:smita.kam...@intel.com%3cmailto:smita.kam...@intel.com>>
 , Shay Gueuron, 
(shay.gue...@intel.com<mailto:shay.gue...@intel.com<mailto:shay.gue...@intel.com%3cmailto:shay.gue...@intel.com>>)
 and Regev Shemy 
(regev.sh...@intel.com<mailto:regev.sh...@intel.com<mailto:regev.sh...@intel.com%3cmailto:regev.sh...@intel.com>>)
 are contributors to this code.

Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8214074

Link to webrev: http://cr.openjdk.java.net/~svkamath/ghash/webrev/

For testing the implementation, I have executed TestAESMain.java. I have 
executed Jtreg tests and tested this code on 64 bit Windows and Linux platforms.

Please review and let me know if you have any comments.

Thanks and Regards,
Smita


RE: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

2018-11-19 Thread Kamath, Smita
Hi Florian,

The performance gain that I reported was over the earlier implementation of 
GHASH using clmul instructions (generate_ghash_process_blocks).
 
Regards,
Smita

-Original Message-
From: Florian Weimer [mailto:fwei...@redhat.com] 
Sent: Monday, November 19, 2018 1:55 PM
To: Kamath, Smita 
Cc: 'Vladimir Kozlov' ; Anthony Scarpino 
; core-libs-dev@openjdk.java.net; hotspot compiler 

Subject: Re: RFR(S)JDK-8214074: Ghash optimization using AVX instructions

* Smita Kamath:

> I'd like to contribute an optimization for GHASH Algorithm using AVX 
> Instructions. I have tested this optimization on SKX x86_64 platform 
> and it shows ~20-30% performance improvement for larger message sizes 
> (for example 8k).

Performance improvement against what?  The pure Java implementation?  I find 
this a bit surprising.  I would have expected a much larger performance 
improvement from the use of CLMUL instructions.

Thanks,
Florian


RFR(S)JDK-8214074: Ghash optimization using AVX instructions

2018-11-19 Thread Kamath, Smita
Hi Vladimir,

I'd like to contribute an optimization for GHASH Algorithm using AVX 
Instructions. I have tested this optimization on SKX x86_64 platform and it 
shows ~20-30% performance improvement for larger message sizes (for example 8k).

I, smita.kam...@intel.com , Shay Gueuron, 
(shay.gue...@intel.com) and Regev Shemy 
(regev.sh...@intel.com) are contributors to this 
code.

Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8214074

Link to webrev: http://cr.openjdk.java.net/~svkamath/ghash/webrev/

For testing the implementation, I have executed TestAESMain.java. I have 
executed Jtreg tests and tested this code on 64 bit Windows and Linux platforms.

Please review and let me know if you have any comments.

Thanks and Regards,
Smita



RE: RFR(S) JDK-8205528: Base64 Encode Algorithm using AVX512 Instructions

2018-06-25 Thread Kamath, Smita
Hi Vladimir,

Please find the updated webrev link. 

Webrev Link: http://cr.openjdk.java.net/~srukmannagar/Base64/webrev.00/
Bug link: https://bugs.openjdk.java.net/browse/JDK-8205528

Please let me know if you have additional questions.

Thanks and Regards,
Smita

-Original Message-
From: Vladimir Kozlov [mailto:vladimir.koz...@oracle.com] 
Sent: Monday, June 25, 2018 10:48 AM
To: Kamath, Smita 
Cc: hotspot compiler ; 
core-libs-dev@openjdk.java.net; Paul Sandoz 
Subject: Re: RFR(S) JDK-8205528: Base64 Encode Algorithm using AVX512 
Instructions

I forgot to reply to your answers.

On 6/22/18 2:49 PM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> Please see my answers to your questions as below:
> 
> 1) One question so: why you have own copy of base64 charsets and not using 
> one in library:
> I am using vpgatherdd instruction to fetch multiple values from base64 array 
> in a single instruction with a vector index. Vpgatherdd instruction works on 
> 32 bit array and so I need to define base64 charset in a 32 bit array. I have 
> given reference to gather instruction below.

As was discussed in an other e-mail lets keep your copy.

> 
> 2) Some indents in new and old Assembler::emit_operand() are off. In new 
> Assembler::emit_operand() is better use } else { instead of 'return' in one 
> branch.
> I'll make the necessary changes and send an updated webrev.
> 
> 3) This is first time I see that XMM register can be used for index in 
> address. Is it true? Can you point to Intel's document which describes it.
> I am using vpgatherdd instruction which requires index vector with scale. It 
> uses VSIB addressing where the index register is a zmm register.
> Please refer to reference manual, volume 2c, page 2211: 
> https://software.intel.com/sites/default/files/managed/39/c5/325462-sd
> m-vol-1-2abcd-3abcd.pdf Also see, section 2.3.12, page 524 for VSIB 
> memory addressing information.

Got it. Thanks for document reference.

> 
> 4)  Please, add tests to test/hotspot/jtreg/compiler/intrinsics/base64 (may 
> be similar to sha or AES in compiler/codegen/aes) to make sure that all 
> flags, intrinsic is used and it produces correct result.
> I will add test cases as per your suggestion.

Thanks,
Vladimir

> 
> Please let me know if you have additional questions.
> 
> Thanks,
> Smita
> 
> -Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.koz...@oracle.com]
> Sent: Friday, June 22, 2018 12:29 PM
> To: Kamath, Smita 
> Cc: hotspot compiler 
> Subject: Re: RFR(S) JDK-8205528: Base64 Encode Algorithm using AVX512 
> Instructions
> 
> Hi Smita,
> 
> I CCing to Libs to review code changes in Base64.java.
> 
> Looks like you changed all need place to implement intrinsic.
> One question so: why you have own copy of base64 charsets and not using one 
> in library:
> 
>private int encode0(byte[] src, int off, int end, byte[] dst) {
>char[] base64 = isURL ? toBase64URL : toBase64;
> 
> Some indents in new and old Assembler::emit_operand() are off.
> 
> In new Assembler::emit_operand() is better use } else { instead of 'return' 
> in one branch.
> 
> This is first time I see that XMM register can be used for index in address. 
> Is it true? Can you point to Intel's document which describes it.
> 
> What testing you did?
> 
> Please, add tests to test/hotspot/jtreg/compiler/intrinsics/base64 (may be 
> similar to sha or AES in compiler/codegen/aes) to make sure that all flags, 
> intrinsic is used and it produces correct result.
> 
> I know there is test/jdk/java/util/Base64/ tests but they may not trigger 
> intrinsic usage. But you can use them as starting point for new tests.
> 
> Thanks,
> Vladimir
> 
> On 6/22/18 11:47 AM, Kamath, Smita wrote:
>> Hi Vladimir,
>>
>> I'd like to contribute an optimization for Base64 Encoding Algorithm 
>> using AVX512 Instructions. This optimization shows 1.5x improvement 
>> on
>> x86_64 platform(SKL).
>>
>> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8205528
>>
>> Link to webrev:
>> http://cr.openjdk.java.net/~vdeshpande/Base64/webrev.00/
>>
>> For testing the implementation, I have run tests under 
>> test/jdk/java/util/Base64/ and also ran 
>> test/jdk/com/sun/jndi/ldap/Base64Test.java
>>
>> I have also run jtreg.
>>
>> Please review and let me know if you have any comments.
>>
>> Thanks and Regards,
>>
>> Smita
>>


RE: RFR(S) JDK-8205528: Base64 Encode Algorithm using AVX512 Instructions

2018-06-22 Thread Kamath, Smita
Hi Vladimir,

Please see my answers to your questions as below:

1) One question so: why you have own copy of base64 charsets and not using one 
in library:
I am using vpgatherdd instruction to fetch multiple values from base64 array in 
a single instruction with a vector index. Vpgatherdd instruction works on 32 
bit array and so I need to define base64 charset in a 32 bit array. I have 
given reference to gather instruction below.

2) Some indents in new and old Assembler::emit_operand() are off. In new 
Assembler::emit_operand() is better use } else { instead of 'return' in one 
branch.
I'll make the necessary changes and send an updated webrev.

3) This is first time I see that XMM register can be used for index in address. 
Is it true? Can you point to Intel's document which describes it.
I am using vpgatherdd instruction which requires index vector with scale. It 
uses VSIB addressing where the index register is a zmm register. 
Please refer to reference manual, volume 2c, page 2211: 
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
Also see, section 2.3.12, page 524 for VSIB memory addressing information.

4)  Please, add tests to test/hotspot/jtreg/compiler/intrinsics/base64 (may be 
similar to sha or AES in compiler/codegen/aes) to make sure that all flags, 
intrinsic is used and it produces correct result.
I will add test cases as per your suggestion.

Please let me know if you have additional questions.

Thanks,
Smita

-Original Message-
From: Vladimir Kozlov [mailto:vladimir.koz...@oracle.com] 
Sent: Friday, June 22, 2018 12:29 PM
To: Kamath, Smita 
Cc: hotspot compiler 
Subject: Re: RFR(S) JDK-8205528: Base64 Encode Algorithm using AVX512 
Instructions

Hi Smita,

I CCing to Libs to review code changes in Base64.java.

Looks like you changed all need place to implement intrinsic.
One question so: why you have own copy of base64 charsets and not using one in 
library:

  private int encode0(byte[] src, int off, int end, byte[] dst) {
  char[] base64 = isURL ? toBase64URL : toBase64;

Some indents in new and old Assembler::emit_operand() are off.

In new Assembler::emit_operand() is better use } else { instead of 'return' in 
one branch.

This is first time I see that XMM register can be used for index in address. Is 
it true? Can you point to Intel's document which describes it.

What testing you did?

Please, add tests to test/hotspot/jtreg/compiler/intrinsics/base64 (may be 
similar to sha or AES in compiler/codegen/aes) to make sure that all flags, 
intrinsic is used and it produces correct result.

I know there is test/jdk/java/util/Base64/ tests but they may not trigger 
intrinsic usage. But you can use them as starting point for new tests.

Thanks,
Vladimir

On 6/22/18 11:47 AM, Kamath, Smita wrote:
> Hi Vladimir,
> 
> I'd like to contribute an optimization for Base64 Encoding Algorithm 
> using AVX512 Instructions. This optimization shows 1.5x improvement on
> x86_64 platform(SKL).
> 
> Link to Bug: https://bugs.openjdk.java.net/browse/JDK-8205528
> 
> Link to webrev: 
> http://cr.openjdk.java.net/~vdeshpande/Base64/webrev.00/
> 
> For testing the implementation, I have run tests under 
> test/jdk/java/util/Base64/ and also ran 
> test/jdk/com/sun/jndi/ldap/Base64Test.java
> 
> I have also run jtreg.
> 
> Please review and let me know if you have any comments.
> 
> Thanks and Regards,
> 
> Smita
>