Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-24 Thread Simon Tooke



On 2020-06-24 4:25 p.m., Andrew Hughes wrote:

On 22/06/2020 10:00, Andrew Haley wrote:

On 18/06/2020 19:33, Martin Balao wrote:

snip...


   * L3724
* The last argument of 'sub' has type 'int', while in the not-Windows
variant is a long. Can we align this?

We should do that, yes. Better it be long everywhere.


Sorry, I should have responded to this sooner.

'long' is 32 bits on Windows x64, and 64 bits on other platforms.

sub() is only ever called with an 'int', so this change would actually 
cause the windows code to move to a jlong (i.e. int64_t), if we're 
talking about function signature.


If we're only talking about aesthetics, the type 'long' works, but it's 
somewhat misleading and doesn't actually change the code





Patch looks good to me too.

Simon, if you are happy with changing the final argument of sub to long
on the Windows side, I'll sponsor this patch, make that minor change and
push:

I'm fine with this change, and thank you.


https://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/jdk.patch

to jdk/jdk

Thanks,




Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-24 Thread Andrew Hughes
On 22/06/2020 10:00, Andrew Haley wrote:
> On 18/06/2020 19:33, Martin Balao wrote:

snip...

>>   * L3724
>>* The last argument of 'sub' has type 'int', while in the not-Windows
>> variant is a long. Can we align this?
> 
> We should do that, yes. Better it be long everywhere.
> 

Patch looks good to me too.

Simon, if you are happy with changing the final argument of sub to long
on the Windows side, I'll sponsor this patch, make that minor change and
push:

https://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/jdk.patch

to jdk/jdk

Thanks,
-- 
Andrew :)

Senior Free Java Software Engineer
Red Hat, Inc. (http://www.redhat.com)

PGP Key: ed25519/0xCFDA0F9B35964222 (hkp://keys.gnupg.net)
Fingerprint = 5132 579D D154 0ED2 3E04  C5A0 CFDA 0F9B 3596 4222



signature.asc
Description: OpenPGP digital signature


Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-22 Thread Andrew Haley
On 18/06/2020 19:33, Martin Balao wrote:
>  * sharedRuntime_x86_64.cpp
>   * L3685
>* Do we still need 'long long' type for 'i' and 'cnt' local variables?

No, but this is 64-bit-only code. And len is a long, so let's keep it.

>   * L3724
>* The last argument of 'sub' has type 'int', while in the not-Windows
> variant is a long. Can we align this?

We should do that, yes. Better it be long everywhere.

>   * L3729
>* Is it possible to directly store in a[i]? (instead of going through
> 'tmp')

I'm not sure this would add anything. In fact, I think this change would
make it harder to read.

> * I guess the compiler will easily optimize this, but we may still
> get rid of the 2nd line
> * I've seen in L3753 you directly store
>
> Note: it's a bit unfortunate that we don't have x86-64 inline assembly
> in CL to maintain the same logic, as there is nothing OS-specific here.

I think the problem is that MS never had a nice way to do inline
assembly in 32-bit C, so they just gave up altogether when they went
64-bit.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-18 Thread Martin Balao
Hi,

On 6/5/20 5:46 PM, Simon Tooke wrote:
> Please let me know what you think.
> 
> updated webrev:
> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/
> 

Overall, the intrinsics looks good to me.

A few minor comments:

 * sharedRuntime_x86_64.cpp
  * L3685
   * Do we still need 'long long' type for 'i' and 'cnt' local variables?
  * L3724
   * The last argument of 'sub' has type 'int', while in the not-Windows
variant is a long. Can we align this?
  * L3729
   * Is it possible to directly store in a[i]? (instead of going through
'tmp')
* I guess the compiler will easily optimize this, but we may still
get rid of the 2nd line
* I've seen in L3753 you directly store

Note: it's a bit unfortunate that we don't have x86-64 inline assembly
in CL to maintain the same logic, as there is nothing OS-specific here.

Note 2: I'm not an official jdk-mainline reviewer, so we still need
someone else who approves.

Thanks,
Martin.-



Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-08 Thread Andrew Haley
On 05/06/2020 21:46, Simon Tooke wrote:
> As per your and Andrew Haley's comments, I have updated the webrev:
> 
> - used NOINLINE
> 
> - used julong
> 
> - deleted the block of unused code.
> 
> Please let me know what you think.
> 
> updated webrev: 
> http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/

That looks clean, especially since it adds support for Windows with
essentially no impact on the Linux code. I'm not sure I can Review
it, though, because I'm one of the co-authors.

It'd be nice if someone else familiar with the arithmetic (someone on
security-dev?) could take a quick look.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-08 Thread David Holmes

Hi Simon,

On 6/06/2020 6:46 am, Simon Tooke wrote:

Thanks again for the review.

As per your and Andrew Haley's comments, I have updated the webrev:

- used NOINLINE

- used julong

- deleted the block of unused code.

Please let me know what you think.

updated webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/


All my comments have been addressed - thanks.

David


Thanks,

-Simon



On 2020-06-05 12:35 a.m., David Holmes wrote:

Hi Simon,

On 4/06/2020 11:35 pm, Simon Tooke wrote:

Hello, David, and thanks for the review!

I've responded to your comments below, and intend to post a new patch 
for review today or tomorrow.


Thanks again,

-Simon

On 2020-06-03 2:06 a.m., David Holmes wrote:

Hi Simon,

On 23/05/2020 12:04 am, Sean Mullan wrote:
Cross-posting to hotspot-dev for additional review since the code 
changes are in hotspot.


--Sean

On 5/21/20 1:24 PM, Simon Tooke wrote:

Hello,

I'd like to request a review for:

JBS: https://bugs.openjdk.java.net/browse/JDK-8243114

Webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/


This change implements the given intrinsics on Windows.

The Windows toolchain has no 128 bit integer types, and no inline 
asm (on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as 
it is with gcc.  This patch had to change some of the linux 
implementation to account for these limitations.


I can't really comment on the intrinsics themselves but a couple of 
suggestions:


- use explicitly sized types e.g. uint64_t instead of unsigned long 
long


Yes, this hurt to type.  A previous review suggested using julong, is 
that acceptable to you?


j* types should only be used when dealing with values that come from 
or go to Java. Obviously julong is a misnomer as Java doesn't have an 
unsigned type, but in general this is something we are trying to fix 
up in the codebase. If these arrays are extracted from Java arrays 
then using a j* type would be appropriate, but then I would expect 
jlong, unless the algorithm explicitly requires unsigned types? If so 
julong would be acceptable.


(an aside: I'm now wondering if there is other code in the JDK that 
assumes long is 64bits - which is not true on all platforms...)


There has been such code, but hopefully there is no remaining actively 
used code with that bug. There are using of "long" that should be 
eradicated (there's an open JBS issue for that as I recall) but the 
remaining uses don't seem to require the long be 64-bit.



- use the existing NOINLINE macro for the _declspec(noinline)

Thanks, will do.


The conditional compilation in this code has me quite confused. 
Looking at the existing code we have:


3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT

So all the above code is only built on not-Windows, and we 
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to 
be dead code! I'm not at all sure whether that is actually a bug and 
the windows ifndef should have had an endif after line 3682; or 
whether we can just simplify the code.
The dead code existed prior to this patch, so I wasn't proposing 
removing it.  I'm happy to do so if that's for the best.  As far as I 
can tell, it's for implementing these intrinsics on gcc platforms 
without assembler.


AFAICS Andrew originally had the ASM_SUBTRACT parts, with the 
unconditional #define, but there was no explanation in the review 
thread as to why the unused code remained present. Then before 
integration all the code was wrapped by the ifndef Windows to exclude 
it from Windows.


I think it needs to be fixed as you are making changes to the Windows 
part and it is very hard to establish how the dead code should look in 
that case.


Thanks,
David



Thanks,
David



My gratitude for Andrew Haley for doing the heavy lifting at the 
core of this patch.


The patch, if accepted, will be offered to 11u as a potential 
backport. The changes apply cleanly modulo some line number changes.


As for the speedup, this test case:

BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
 .mapToObj(n -> BigInteger.valueOf(n).add(base))
 .filter(i -> i.isProbablePrime(50))
 .count();

goes from 1 minute 20 seconds down to about 35 seconds om my VM, 
over multiple runs.  As is the case on other platforms where the 
intrinsics are supported, they will be enabled by default on Windows.


Thank you for your time,

-Simon Tooke


Principal Software Engineer,

Red Hat Canada













Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-05 Thread Simon Tooke

Thanks again for the review.

As per your and Andrew Haley's comments, I have updated the webrev:

- used NOINLINE

- used julong

- deleted the block of unused code.

Please let me know what you think.

updated webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/


Thanks,

-Simon



On 2020-06-05 12:35 a.m., David Holmes wrote:

Hi Simon,

On 4/06/2020 11:35 pm, Simon Tooke wrote:

Hello, David, and thanks for the review!

I've responded to your comments below, and intend to post a new patch 
for review today or tomorrow.


Thanks again,

-Simon

On 2020-06-03 2:06 a.m., David Holmes wrote:

Hi Simon,

On 23/05/2020 12:04 am, Sean Mullan wrote:
Cross-posting to hotspot-dev for additional review since the code 
changes are in hotspot.


--Sean

On 5/21/20 1:24 PM, Simon Tooke wrote:

Hello,

I'd like to request a review for:

JBS: https://bugs.openjdk.java.net/browse/JDK-8243114

Webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/


This change implements the given intrinsics on Windows.

The Windows toolchain has no 128 bit integer types, and no inline 
asm (on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as 
it is with gcc.  This patch had to change some of the linux 
implementation to account for these limitations.


I can't really comment on the intrinsics themselves but a couple of 
suggestions:


- use explicitly sized types e.g. uint64_t instead of unsigned long 
long


Yes, this hurt to type.  A previous review suggested using julong, is 
that acceptable to you?


j* types should only be used when dealing with values that come from 
or go to Java. Obviously julong is a misnomer as Java doesn't have an 
unsigned type, but in general this is something we are trying to fix 
up in the codebase. If these arrays are extracted from Java arrays 
then using a j* type would be appropriate, but then I would expect 
jlong, unless the algorithm explicitly requires unsigned types? If so 
julong would be acceptable.


(an aside: I'm now wondering if there is other code in the JDK that 
assumes long is 64bits - which is not true on all platforms...)


There has been such code, but hopefully there is no remaining actively 
used code with that bug. There are using of "long" that should be 
eradicated (there's an open JBS issue for that as I recall) but the 
remaining uses don't seem to require the long be 64-bit.



- use the existing NOINLINE macro for the _declspec(noinline)

Thanks, will do.


The conditional compilation in this code has me quite confused. 
Looking at the existing code we have:


3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT

So all the above code is only built on not-Windows, and we 
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to 
be dead code! I'm not at all sure whether that is actually a bug and 
the windows ifndef should have had an endif after line 3682; or 
whether we can just simplify the code.
The dead code existed prior to this patch, so I wasn't proposing 
removing it.  I'm happy to do so if that's for the best.  As far as I 
can tell, it's for implementing these intrinsics on gcc platforms 
without assembler.


AFAICS Andrew originally had the ASM_SUBTRACT parts, with the 
unconditional #define, but there was no explanation in the review 
thread as to why the unused code remained present. Then before 
integration all the code was wrapped by the ifndef Windows to exclude 
it from Windows.


I think it needs to be fixed as you are making changes to the Windows 
part and it is very hard to establish how the dead code should look in 
that case.


Thanks,
David



Thanks,
David



My gratitude for Andrew Haley for doing the heavy lifting at the 
core of this patch.


The patch, if accepted, will be offered to 11u as a potential 
backport. The changes apply cleanly modulo some line number changes.


As for the speedup, this test case:

BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
 .mapToObj(n -> BigInteger.valueOf(n).add(base))
 .filter(i -> i.isProbablePrime(50))
 .count();

goes from 1 minute 20 seconds down to about 35 seconds om my VM, 
over multiple runs.  As is the case on other platforms where the 
intrinsics are supported, they will be enabled by default on Windows.


Thank you for your time,

-Simon Tooke


Principal Software Engineer,

Red Hat Canada













Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-05 Thread Andrew Haley
On 05/06/2020 05:35, David Holmes wrote:
> If these arrays are extracted from Java arrays

Yes, they are.

> then using a j* type would be appropriate, but then I would expect
> jlong, unless the algorithm explicitly requires unsigned types?

It does. All of the code handling bignums uses unsigned words.

> If so julong would be acceptable.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-04 Thread David Holmes

Hi Simon,

On 4/06/2020 11:35 pm, Simon Tooke wrote:

Hello, David, and thanks for the review!

I've responded to your comments below, and intend to post a new patch 
for review today or tomorrow.


Thanks again,

-Simon

On 2020-06-03 2:06 a.m., David Holmes wrote:

Hi Simon,

On 23/05/2020 12:04 am, Sean Mullan wrote:
Cross-posting to hotspot-dev for additional review since the code 
changes are in hotspot.


--Sean

On 5/21/20 1:24 PM, Simon Tooke wrote:

Hello,

I'd like to request a review for:

JBS: https://bugs.openjdk.java.net/browse/JDK-8243114

Webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/


This change implements the given intrinsics on Windows.

The Windows toolchain has no 128 bit integer types, and no inline 
asm (on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as it 
is with gcc.  This patch had to change some of the linux 
implementation to account for these limitations.


I can't really comment on the intrinsics themselves but a couple of 
suggestions:


- use explicitly sized types e.g. uint64_t instead of unsigned long long


Yes, this hurt to type.  A previous review suggested using julong, is 
that acceptable to you?


j* types should only be used when dealing with values that come from or 
go to Java. Obviously julong is a misnomer as Java doesn't have an 
unsigned type, but in general this is something we are trying to fix up 
in the codebase. If these arrays are extracted from Java arrays then 
using a j* type would be appropriate, but then I would expect jlong, 
unless the algorithm explicitly requires unsigned types? If so julong 
would be acceptable.


(an aside: I'm now wondering if there is other code in the JDK that 
assumes long is 64bits - which is not true on all platforms...)


There has been such code, but hopefully there is no remaining actively 
used code with that bug. There are using of "long" that should be 
eradicated (there's an open JBS issue for that as I recall) but the 
remaining uses don't seem to require the long be 64-bit.



- use the existing NOINLINE macro for the _declspec(noinline)

Thanks, will do.


The conditional compilation in this code has me quite confused. 
Looking at the existing code we have:


3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT

So all the above code is only built on not-Windows, and we 
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to 
be dead code! I'm not at all sure whether that is actually a bug and 
the windows ifndef should have had an endif after line 3682; or 
whether we can just simplify the code.
The dead code existed prior to this patch, so I wasn't proposing 
removing it.  I'm happy to do so if that's for the best.  As far as I 
can tell, it's for implementing these intrinsics on gcc platforms 
without assembler.


AFAICS Andrew originally had the ASM_SUBTRACT parts, with the 
unconditional #define, but there was no explanation in the review thread 
as to why the unused code remained present. Then before integration all 
the code was wrapped by the ifndef Windows to exclude it from Windows.


I think it needs to be fixed as you are making changes to the Windows 
part and it is very hard to establish how the dead code should look in 
that case.


Thanks,
David



Thanks,
David



My gratitude for Andrew Haley for doing the heavy lifting at the 
core of this patch.


The patch, if accepted, will be offered to 11u as a potential 
backport. The changes apply cleanly modulo some line number changes.


As for the speedup, this test case:

BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
 .mapToObj(n -> BigInteger.valueOf(n).add(base))
 .filter(i -> i.isProbablePrime(50))
 .count();

goes from 1 minute 20 seconds down to about 35 seconds om my VM, 
over multiple runs.  As is the case on other platforms where the 
intrinsics are supported, they will be enabled by default on Windows.


Thank you for your time,

-Simon Tooke


Principal Software Engineer,

Red Hat Canada









Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-04 Thread Andrew Haley
On 04/06/2020 14:35, Simon Tooke wrote:
> Yes, this hurt to type.  A previous review suggested using julong, is 
> that acceptable to you?
> 
> (an aside: I'm now wondering if there is other code in the JDK that 
> assumes long is 64bits - which is not true on all platforms...)

That was just me, I think. I knew that this code would take some
fixing up on Windows so I postponed portability issues until then.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-04 Thread Simon Tooke

Hello, David, and thanks for the review!

I've responded to your comments below, and intend to post a new patch 
for review today or tomorrow.


Thanks again,

-Simon

On 2020-06-03 2:06 a.m., David Holmes wrote:

Hi Simon,

On 23/05/2020 12:04 am, Sean Mullan wrote:
Cross-posting to hotspot-dev for additional review since the code 
changes are in hotspot.


--Sean

On 5/21/20 1:24 PM, Simon Tooke wrote:

Hello,

I'd like to request a review for:

JBS: https://bugs.openjdk.java.net/browse/JDK-8243114

Webrev: 
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/


This change implements the given intrinsics on Windows.

The Windows toolchain has no 128 bit integer types, and no inline 
asm (on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as it 
is with gcc.  This patch had to change some of the linux 
implementation to account for these limitations.


I can't really comment on the intrinsics themselves but a couple of 
suggestions:


- use explicitly sized types e.g. uint64_t instead of unsigned long long


Yes, this hurt to type.  A previous review suggested using julong, is 
that acceptable to you?


(an aside: I'm now wondering if there is other code in the JDK that 
assumes long is 64bits - which is not true on all platforms...)



- use the existing NOINLINE macro for the _declspec(noinline)

Thanks, will do.


The conditional compilation in this code has me quite confused. 
Looking at the existing code we have:


3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT

So all the above code is only built on not-Windows, and we 
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to 
be dead code! I'm not at all sure whether that is actually a bug and 
the windows ifndef should have had an endif after line 3682; or 
whether we can just simplify the code.
The dead code existed prior to this patch, so I wasn't proposing 
removing it.  I'm happy to do so if that's for the best.  As far as I 
can tell, it's for implementing these intrinsics on gcc platforms 
without assembler.


Thanks,
David



My gratitude for Andrew Haley for doing the heavy lifting at the 
core of this patch.


The patch, if accepted, will be offered to 11u as a potential 
backport. The changes apply cleanly modulo some line number changes.


As for the speedup, this test case:

BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
 .mapToObj(n -> BigInteger.valueOf(n).add(base))
 .filter(i -> i.isProbablePrime(50))
 .count();

goes from 1 minute 20 seconds down to about 35 seconds om my VM, 
over multiple runs.  As is the case on other platforms where the 
intrinsics are supported, they will be enabled by default on Windows.


Thank you for your time,

-Simon Tooke


Principal Software Engineer,

Red Hat Canada









Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-06-03 Thread David Holmes

Hi Simon,

On 23/05/2020 12:04 am, Sean Mullan wrote:
Cross-posting to hotspot-dev for additional review since the code 
changes are in hotspot.


--Sean

On 5/21/20 1:24 PM, Simon Tooke wrote:

Hello,

I'd like to request a review for:

JBS: https://bugs.openjdk.java.net/browse/JDK-8243114

Webrev: http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/

This change implements the given intrinsics on Windows.

The Windows toolchain has no 128 bit integer types, and no inline asm 
(on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as it is 
with gcc.  This patch had to change some of the linux implementation 
to account for these limitations.


I can't really comment on the intrinsics themselves but a couple of 
suggestions:


- use explicitly sized types e.g. uint64_t instead of unsigned long long
- use the existing NOINLINE macro for the _declspec(noinline)

The conditional compilation in this code has me quite confused. Looking 
at the existing code we have:


3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT

So all the above code is only built on not-Windows, and we 
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to be 
dead code! I'm not at all sure whether that is actually a bug and the 
windows ifndef should have had an endif after line 3682; or whether we 
can just simplify the code.


Thanks,
David



My gratitude for Andrew Haley for doing the heavy lifting at the core 
of this patch.


The patch, if accepted, will be offered to 11u as a potential 
backport. The changes apply cleanly modulo some line number changes.


As for the speedup, this test case:

BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
 .mapToObj(n -> BigInteger.valueOf(n).add(base))
 .filter(i -> i.isProbablePrime(50))
 .count();

goes from 1 minute 20 seconds down to about 35 seconds om my VM, over 
multiple runs.  As is the case on other platforms where the intrinsics 
are supported, they will be enabled by default on Windows.


Thank you for your time,

-Simon Tooke


Principle Software Engineer,

Red Hat Canada





Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-05-27 Thread Andrew Haley
On 21/05/2020 18:24, Simon Tooke wrote:
> This change implements the given intrinsics on Windows.

Thank you.

Does julong work on Windows, rather than unsigned long long?  If not,
perhaps a local typedef might help. All those "unsigned log long"s
look a bit clumsy.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. 
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671



Re: RFR[15] JDK-8243114: Implement montgomery{Multiply, Square}intrinsics on Windows

2020-05-22 Thread Sean Mullan
Cross-posting to hotspot-dev for additional review since the code 
changes are in hotspot.


--Sean

On 5/21/20 1:24 PM, Simon Tooke wrote:

Hello,

I'd like to request a review for:

JBS: https://bugs.openjdk.java.net/browse/JDK-8243114

Webrev: http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/

This change implements the given intrinsics on Windows.

The Windows toolchain has no 128 bit integer types, and no inline asm 
(on x64 and Arm).  In addition, 'long' is 4 bytes, not 8, as it is with 
gcc.  This patch had to change some of the linux implementation to 
account for these limitations.


My gratitude for Andrew Haley for doing the heavy lifting at the core of 
this patch.


The patch, if accepted, will be offered to 11u as a potential backport. 
The changes apply cleanly modulo some line number changes.


As for the speedup, this test case:

BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
     .mapToObj(n -> BigInteger.valueOf(n).add(base))
     .filter(i -> i.isProbablePrime(50))
     .count();

goes from 1 minute 20 seconds down to about 35 seconds om my VM, over 
multiple runs.  As is the case on other platforms where the intrinsics 
are supported, they will be enabled by default on Windows.


Thank you for your time,

-Simon Tooke


Principle Software Engineer,

Red Hat Canada