Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]
On Tue, 21 May 2024 07:21:14 GMT, Tobias Hartmann wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> shenandoah verifier > > I'm getting some conflicts when trying to apply this to master. Could you > please merge the PR? Hi @TobiHartmann , merged with no issues for me. Could you please run the tests again? (I think Tony did run them, but can't hurt verifying again). Thanks! - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2123122468
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > shenandoah verifier I'm getting some conflicts when trying to apply this to master. Could you please merge the PR? - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2121929550
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > shenandoah verifier I'll send this through our testing and will report back once it passed. - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2121914071
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > shenandoah verifier Thanks Sandhya! Now that I have @ascarpino approval as well, I plan to integrate next Tuesday. - PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2118443577
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6443.934 ± 6.491 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6152.979 ± 4.954 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1895.410 ± 36.979 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1878.955 ± 45.487 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1357.810 ± 26.584 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1352.119 ± 23.547 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± >> 10.970 ops/s >> >> Performance, no intrinsic: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt Score Error Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256 >> thrpt3 6529.839 ± 42.420 ops/s >> SignatureBench.ECDSA.signSHA256withECDSA 16384 256 >> thrpt3 6199.747 ± 133.566 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA1024 256 >> thrpt3 1973.676 ± 54.071 ops/s >> SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 >> thrpt3 1932.127 ± 35.920 ops/s >> Benchmark(algorithm) >> (keyLength) (kpgAlgorithm) (provider) Mode Cnt ScoreError Units >> o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1355.788 ± 29.858 ops/s >> o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH >> 256 EC thrpt3 1346.523 ± 28.722 ops/s >> Benchmark (isMontBench) Mode Cnt Score >> Error Units >> PolynomialP256Bench.benchMultiply true thrpt3 1919.57... > > Volodymyr Paprotski has updated the pull request incrementally with one > additional commit since the last revision: > > shenandoah verifier Marked as reviewed by sviswanathan (Reviewer). The intrinsics and the C2 changes look good to me. - PR Review: https://git.openjdk.org/jdk/pull/18583#pullrequestreview-2064439617 PR Comment: https://git.openjdk.org/jdk/pull/18583#issuecomment-2118426661
Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]
> Performance. Before: > > Benchmark(algorithm) (dataSize) (keyLength) > (provider) Mode Cnt ScoreError Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt3 6443.934 ± 6.491 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt3 6152.979 ± 4.954 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA1024 256 > thrpt3 1895.410 ± 36.979 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 > thrpt3 1878.955 ± 45.487 ops/s > Benchmark(algorithm) (keyLength) > (kpgAlgorithm) (provider) Mode Cnt ScoreError Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1357.810 ± 26.584 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1352.119 ± 23.547 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply false thrpt3 1746.126 ± > 10.970 ops/s > > Performance, no intrinsic: > > Benchmark(algorithm) (dataSize) (keyLength) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt3 6529.839 ± 42.420 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt3 6199.747 ± 133.566 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA1024 256 > thrpt3 1973.676 ± 54.071 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 > thrpt3 1932.127 ± 35.920 ops/s > Benchmark(algorithm) (keyLength) > (kpgAlgorithm) (provider) Mode Cnt ScoreError Units > o.o.b.j.c.full.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1355.788 ± 29.858 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574 ± > 10.591 ops/s > > Performance, **with intrinsics*... Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: shenandoah verifier - Changes: - all: https://git.openjdk.org/jdk/pull/18583/files - new: https://git.openjdk.org/jdk/pull/18583/files/5c360e35..df4fe6fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18583=10 - incr: https://webrevs.openjdk.org/?repo=jdk=18583=09-10 Stats: 7 lines in 2 files changed: 6 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18583/head:pull/18583 PR: https://git.openjdk.org/jdk/pull/18583