This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). 
Still faster, but not as much.

The fix is to undo 'int' return type on mult()/square(), which allowed to 
return partially reduced result (i.e. this avoids extra reductions when mult() 
result is fed into addition). This is the behaviour before the Montgomery ECC 
PR.

I have a slightly better mult() intrinsic that does reduction at the end, but 
decided to use a more conservative fix and just keep the reduction in Java 
(i.e. original mult() refactored into multImpl() and reducePositive()) Will 
commit these optimizations I discovered while working on this in next release.

---

Performance before Montgomery PR:

Benchmark                        (algorithm)  (dataSize)  (keyLength)  
(provider)   Mode  Cnt     Score    Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256           
   thrpt    3  6398.727 ±  7.400  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256           
   thrpt    3  6129.739 ±  5.995  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256           
   thrpt    3  1889.928 ± 54.660  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256           
   thrpt    3  1866.339 ± 42.438  ops/s
Benchmark                                            (algorithm)  (keyLength)  
(kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256   
           EC              thrpt    3  1350.745 ± 28.514  ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256   
           EC              thrpt    3  1349.393 ± 32.050  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  
(provider)   Mode  Cnt     Score    Error  Units
KeyAgreementBench.XDH.generateSecret          XDH          255             XDH  
            thrpt    3  8435.277 ± 27.230  ops/s

Performance in master without mult() intrinsic

Benchmark                        (algorithm)  (dataSize)  (keyLength)  
(provider)   Mode  Cnt     Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256           
   thrpt    3  6539.589 ± 132.844  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256           
   thrpt    3  6202.530 ± 124.496  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256           
   thrpt    3  1967.038 ±  15.819  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256           
   thrpt    3  1931.667 ±  22.901  ops/s
Benchmark                                            (algorithm)  (keyLength)  
(kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256   
           EC              thrpt    3  1354.143 ± 24.861  ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256   
           EC              thrpt    3  1354.139 ± 21.904  ops/s


Performance in master with mult() intrinsic

Benchmark                        (algorithm)  (dataSize)  (keyLength)  
(provider)   Mode  Cnt      Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256           
   thrpt    3  10534.707 ±  20.690  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256           
   thrpt    3   9729.246 ± 102.803  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256           
   thrpt    3   3549.011 ±  77.343  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256           
   thrpt    3   3458.107 ±  14.622  ops/s
Benchmark                                            (algorithm)  (keyLength)  
(kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256   
           EC              thrpt    3  2563.566 ± 94.381  ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256   
           EC              thrpt    3  2569.143 ± 53.337  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  
(provider)   Mode  Cnt     Score    Error  Units
KeyAgreementBench.XDH.generateSecret          XDH          255             XDH  
            thrpt    3  8309.028 ± 22.071  ops/s


THIS PR without mult intrinsic

Benchmark                        (algorithm)  (dataSize)  (keyLength)  
(provider)   Mode  Cnt     Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256           
   thrpt    3  6225.541 ± 111.874  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256           
   thrpt    3  5913.876 ± 121.556  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256           
   thrpt    3  1837.740 ±  42.881  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256           
   thrpt    3  1815.064 ±  72.015  ops/s
Benchmark                                            (algorithm)  (keyLength)  
(kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256   
           EC              thrpt    3  1271.716 ± 17.119  ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256   
           EC              thrpt    3  1265.405 ± 19.382  ops/s


THIS PR with mult intrinsic

Benchmark                        (algorithm)  (dataSize)  (keyLength)  
(provider)   Mode  Cnt     Score     Error  Units
SignatureBench.ECDSA.sign    SHA256withECDSA        1024          256           
   thrpt    3  9560.700 ± 232.557  ops/s
SignatureBench.ECDSA.sign    SHA256withECDSA       16384          256           
   thrpt    3  8916.806 ± 164.756  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA        1024          256           
   thrpt    3  3064.470 ±  72.166  ops/s
SignatureBench.ECDSA.verify  SHA256withECDSA       16384          256           
   thrpt    3  2991.568 ±  75.720  ops/s
Benchmark                                            (algorithm)  (keyLength)  
(kpgAlgorithm)  (provider)   Mode  Cnt     Score    Error  Units
o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret          ECDH          256   
           EC              thrpt    3  2200.308 ± 13.744  ops/s
o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret         ECDH          256   
           EC              thrpt    3  2203.028 ±  1.948  ops/s
Benchmark                             (algorithm)  (keyLength)  (kpgAlgorithm)  
(provider)   Mode  Cnt     Score    Error  Units
KeyAgreementBench.XDH.generateSecret          XDH          255             XDH  
            thrpt    3  8514.924 ± 59.022  ops/s

-------------

Commit messages:
 - whitespace
 - better reduction refactoring
 - Undo incomplete p256 mult reduction optimization

Changes: https://git.openjdk.org/jdk/pull/19728/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19728&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8333583
  Stats: 130 lines in 9 files changed: 53 ins; 37 del; 40 mod
  Patch: https://git.openjdk.org/jdk/pull/19728.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19728/head:pull/19728

PR: https://git.openjdk.org/jdk/pull/19728

Reply via email to