Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-21 Thread Volodymyr Paprotski
On Tue, 21 May 2024 07:21:14 GMT, Tobias Hartmann  wrote:

>> Volodymyr Paprotski has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   shenandoah verifier
>
> I'm getting some conflicts when trying to apply this to master. Could you 
> please merge the PR?

Hi @TobiHartmann , merged with no issues for me. Could you please run the tests 
again? (I think Tony did run them, but can't hurt verifying again). Thanks!

-

PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2123122468


Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-21 Thread Tobias Hartmann
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski  wrote:

>> Performance. Before:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt ScoreError  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6443.934 ±  6.491  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6152.979 ±  4.954  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1895.410 ± 36.979  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1878.955 ± 45.487  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1357.810 ± 26.584  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1352.119 ± 23.547  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply  false  thrpt3  1746.126 ± 
>> 10.970  ops/s
>> 
>> Performance, no intrinsic:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt Score Error  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6529.839 ±  42.420  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6199.747 ± 133.566  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1973.676 ±  54.071  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1932.127 ±  35.920  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1355.788 ± 29.858  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1346.523 ± 28.722  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply   true  thrpt3  1919.57...
>
> Volodymyr Paprotski has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   shenandoah verifier

I'm getting some conflicts when trying to apply this to master. Could you 
please merge the PR?

-

PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2121929550


Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-21 Thread Tobias Hartmann
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski  wrote:

>> Performance. Before:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt ScoreError  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6443.934 ±  6.491  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6152.979 ±  4.954  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1895.410 ± 36.979  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1878.955 ± 45.487  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1357.810 ± 26.584  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1352.119 ± 23.547  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply  false  thrpt3  1746.126 ± 
>> 10.970  ops/s
>> 
>> Performance, no intrinsic:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt Score Error  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6529.839 ±  42.420  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6199.747 ± 133.566  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1973.676 ±  54.071  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1932.127 ±  35.920  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1355.788 ± 29.858  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1346.523 ± 28.722  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply   true  thrpt3  1919.57...
>
> Volodymyr Paprotski has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   shenandoah verifier

I'll send this through our testing and will report back once it passed.

-

PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2121914071


Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Volodymyr Paprotski
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski  wrote:

>> Performance. Before:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt ScoreError  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6443.934 ±  6.491  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6152.979 ±  4.954  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1895.410 ± 36.979  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1878.955 ± 45.487  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1357.810 ± 26.584  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1352.119 ± 23.547  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply  false  thrpt3  1746.126 ± 
>> 10.970  ops/s
>> 
>> Performance, no intrinsic:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt Score Error  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6529.839 ±  42.420  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6199.747 ± 133.566  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1973.676 ±  54.071  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1932.127 ±  35.920  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1355.788 ± 29.858  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1346.523 ± 28.722  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply   true  thrpt3  1919.57...
>
> Volodymyr Paprotski has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   shenandoah verifier

Thanks Sandhya!

Now that I have @ascarpino approval as well, I plan to integrate next Tuesday.

-

PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2118443577


Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Sandhya Viswanathan
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski  wrote:

>> Performance. Before:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt ScoreError  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6443.934 ±  6.491  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6152.979 ±  4.954  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1895.410 ± 36.979  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1878.955 ± 45.487  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1357.810 ± 26.584  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1352.119 ± 23.547  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply  false  thrpt3  1746.126 ± 
>> 10.970  ops/s
>> 
>> Performance, no intrinsic:
>> 
>> Benchmark(algorithm)  (dataSize)  (keyLength)  
>> (provider)   Mode  Cnt Score Error  Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024  256
>>   thrpt3  6529.839 ±  42.420  ops/s
>> SignatureBench.ECDSA.signSHA256withECDSA   16384  256
>>   thrpt3  6199.747 ± 133.566  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256
>>   thrpt3  1973.676 ±  54.071  ops/s
>> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256
>>   thrpt3  1932.127 ±  35.920  ops/s
>> Benchmark(algorithm)  
>> (keyLength)  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
>> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  
>> 256  EC  thrpt3  1355.788 ± 29.858  ops/s
>> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  
>> 256  EC  thrpt3  1346.523 ± 28.722  ops/s
>> Benchmark  (isMontBench)   Mode  Cnt Score
>> Error  Units
>> PolynomialP256Bench.benchMultiply   true  thrpt3  1919.57...
>
> Volodymyr Paprotski has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   shenandoah verifier

Marked as reviewed by sviswanathan (Reviewer).

The intrinsics and the C2 changes look good to me.

-

PR Review: https://git.openjdk.org/jdk/pull/18583#pullrequestreview-2064439617
PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2118426661


Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Volodymyr Paprotski
> Performance. Before:
> 
> Benchmark(algorithm)  (dataSize)  (keyLength)  
> (provider)   Mode  Cnt ScoreError  Units
> SignatureBench.ECDSA.signSHA256withECDSA1024  256 
>  thrpt3  6443.934 ±  6.491  ops/s
> SignatureBench.ECDSA.signSHA256withECDSA   16384  256 
>  thrpt3  6152.979 ±  4.954  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256 
>  thrpt3  1895.410 ± 36.979  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256 
>  thrpt3  1878.955 ± 45.487  ops/s
> Benchmark(algorithm)  (keyLength) 
>  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  256 
>  EC  thrpt3  1357.810 ± 26.584  ops/s
> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  256 
>  EC  thrpt3  1352.119 ± 23.547  ops/s
> Benchmark  (isMontBench)   Mode  Cnt Score
> Error  Units
> PolynomialP256Bench.benchMultiply  false  thrpt3  1746.126 ± 
> 10.970  ops/s
> 
> Performance, no intrinsic:
> 
> Benchmark(algorithm)  (dataSize)  (keyLength)  
> (provider)   Mode  Cnt Score Error  Units
> SignatureBench.ECDSA.signSHA256withECDSA1024  256 
>  thrpt3  6529.839 ±  42.420  ops/s
> SignatureBench.ECDSA.signSHA256withECDSA   16384  256 
>  thrpt3  6199.747 ± 133.566  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA1024  256 
>  thrpt3  1973.676 ±  54.071  ops/s
> SignatureBench.ECDSA.verify  SHA256withECDSA   16384  256 
>  thrpt3  1932.127 ±  35.920  ops/s
> Benchmark(algorithm)  (keyLength) 
>  (kpgAlgorithm)  (provider)   Mode  Cnt ScoreError  Units
> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret  ECDH  256 
>  EC  thrpt3  1355.788 ± 29.858  ops/s
> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH  256 
>  EC  thrpt3  1346.523 ± 28.722  ops/s
> Benchmark  (isMontBench)   Mode  Cnt Score
> Error  Units
> PolynomialP256Bench.benchMultiply   true  thrpt3  1919.574 ± 
> 10.591  ops/s
> 
> Performance, **with intrinsics*...

Volodymyr Paprotski has updated the pull request incrementally with one 
additional commit since the last revision:

  shenandoah verifier

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18583/files
  - new: https://git.openjdk.org/jdk/pull/18583/files/5c360e35..df4fe6fa

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=18583=10
 - incr: https://webrevs.openjdk.org/?repo=jdk=18583=09-10

  Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18583.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583

PR: https://git.openjdk.org/jdk/pull/18583