Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]
On Wed, 22 May 2024 14:19:36 GMT, Volodymyr Paprotski wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains 17 >> additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/master' into ecc-montgomery >> - shenandoah verifier >> - comments from Sandhya >> - whitespace >> - add message back >> - whitespace >> - Use AffinePoint to exit Montgomery domain >> >>Style notes: >>Affine.equals() >>- Mismatched fields only appear to be used from testing, perhaps >> should be moved there instead >>Affine.getX(boolean)|getY(boolean) >>- "Passing flag is bad design" - cleanest/performant alternative to >> several instanceof checks >>- needed to convert Affine to Projective (need to stay in montgomery >> domain) >>ECOperations.PointMultiplier >> - changes could probably be restored to original (since >> ProjectivePoint handling no longer required) >> - consider these changes an improvement? (fewer nested classes) >> - was an inner-class but not using inner-class features (i.e. ecOps >> variable should be converted) >> - whitespace >> - Comments from Tony and Jatin >> - Comments from Jatin and Tony >> - ... and 7 more: https://git.openjdk.org/jdk/compare/1adfff34...b1a33004 > > Thanks Tobi! Unfortunately, this caused a performance regression, see [JDK-8333583](https://bugs.openjdk.org/browse/JDK-8333583). @vpaprotsk, please have a look. - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2149576062
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]
On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into ecc-montgomery > - shenandoah verifier > - comments from Sandhya > - whitespace > - add message back > - whitespace > - Use AffinePoint to exit Montgomery domain > >Style notes: >Affine.equals() >- Mismatched fields only appear to be used from testing, perhaps > should be moved there instead >Affine.getX(boolean)|getY(boolean) >- "Passing flag is bad design" - cleanest/performant alternative to > several instanceof checks >- needed to convert Affine to Projective (need to stay in montgomery > domain) >ECOperations.PointMultiplier > - changes could probably be restored to original (since ProjectivePoint > handling no longer required) > - consider these changes an improvement? (fewer nested classes) > - was an inner-class but not using inner-class features (i.e. ecOps > variable should be converted) > - whitespace > - Comments from Tony and Jatin > - Comments from Jatin and Tony > - ... and 7 more: https://git.openjdk.org/jdk/compare/c0032e2c...b1a33004 Thanks Tobi! - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2124924526
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]
On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into ecc-montgomery > - shenandoah verifier > - comments from Sandhya > - whitespace > - add message back > - whitespace > - Use AffinePoint to exit Montgomery domain > >Style notes: >Affine.equals() >- Mismatched fields only appear to be used from testing, perhaps > should be moved there instead >Affine.getX(boolean)|getY(boolean) >- "Passing flag is bad design" - cleanest/performant alternative to > several instanceof checks >- needed to convert Affine to Projective (need to stay in montgomery > domain) >ECOperations.PointMultiplier > - changes could probably be restored to original (since ProjectivePoint > handling no longer required) > - consider these changes an improvement? (fewer nested classes) > - was an inner-class but not using inner-class features (i.e. ecOps > variable should be converted) > - whitespace > - Comments from Tony and Jatin > - Comments from Jatin and Tony > - ... and 7 more: https://git.openjdk.org/jdk/compare/45457761...b1a33004 All tests passed. - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2124892444
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]
On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request with a new target base due > to a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 17 additional > commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into ecc-montgomery > - shenandoah verifier > - comments from Sandhya > - whitespace > - add message back > - whitespace > - Use AffinePoint to exit Montgomery domain > >Style notes: >Affine.equals() >- Mismatched fields only appear to be used from testing, perhaps > should be moved there instead >Affine.getX(boolean)|getY(boolean) >- "Passing flag is bad design" - cleanest/performant alternative to > several instanceof checks >- needed to convert Affine to Projective (need to stay in montgomery > domain) >ECOperations.PointMultiplier > - changes could probably be restored to original (since ProjectivePoint > handling no longer required) > - consider these changes an improvement? (fewer nested classes) > - was an inner-class but not using inner-class features (i.e. ecOps > variable should be converted) > - whitespace > - Comments from Tony and Jatin > - Comments from Jatin and Tony > - ... and 7 more: https://git.openjdk.org/jdk/compare/9ee91a9f...b1a33004 Thanks! I submitted testing and will report back once it passed. - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2123869579
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]
> Performance. Before: > > Benchmark(algorithm) (dataSize) (keyLength) > (provider) Mode Cnt ScoreError Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt3 6443.934 ± 6.491 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt3 6152.979 ± 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA1024 256 > thrpt3 1895.410 ± 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 > thrpt3 1878.955 ± 45.487 ops/s > Benchmark(algorithm) (keyLength) > (kpgAlgorithm) (provider) Mode Cnt ScoreError Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1357.810 ± 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1352.119 ± 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± > 10.970 ops/s > > Performance, no intrinsic: > > Benchmark(algorithm) (dataSize) (keyLength) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt3 6529.839 ± 42.420 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt3 6199.747 ± 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA1024 256 > thrpt3 1973.676 ± 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 > thrpt3 1932.127 ± 35.920 ops/s > Benchmark(algorithm) (keyLength) > (kpgAlgorithm) (provider) Mode Cnt ScoreError Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1355.788 ± 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574 ± > 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into ecc-montgomery - shenandoah verifier - comments from Sandhya - whitespace - add message back - whitespace - Use AffinePoint to exit Montgomery domain Style notes: Affine.equals() - Mismatched fields only appear to be used from testing, perhaps should be moved there instead Affine.getX(boolean)|getY(boolean) - "Passing flag is bad design" - cleanest/performant alternative to several instanceof checks - needed to convert Affine to Projective (need to stay in montgomery domain) ECOperations.PointMultiplier - changes could probably be restored to original (since ProjectivePoint handling no longer required) - consider these changes an improvement? (fewer nested classes) - was an inner-class but not using inner-class features (i.e. ecOps variable should be converted) - whitespace - Comments from Tony and Jatin - Comments from Jatin and Tony - ... and 7 more: https://git.openjdk.org/jdk/compare/12e8009b...b1a33004 - Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/df4fe6fa..b1a33004 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18583&range=10-11 Stats: 190975 lines in 3949 files changed: 105304 ins; 64688 del; 20983 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583