On 2023-07-08 23:24, Torbjörn Granlund wrote:
The result is 14.5% increase in GMPbench value, new value is 9176.2 !

That's much better, but still not as good as it should be with some
optimisation.


I had to upgrade 7600X PC Bios to version 1.24 to avoid CPU burn risk for Ryzen 7000 CPUs. Therefore I took baseline with 1.24 Bios, only small differences to before:
https://github.com/Hermann-SW/QuadraticRegression/commit/89cd05b2cc7de406b8dc6cecc99c4d56ba8fe361

This is the libgmpxx code used to compute "powm(_, qnr, p/4, p)" and measuring runtimes:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.cc

While daily snapshot did increase gmpbench value from 8010.2 to 9176.2,
for sqrtm1.cc runtimes increased a bit:

+--------+--------+--------+--------+
! 100355 ! 200700 ! 272770 ! 330855 ! #digits(p) for prime p
+--------+--------+--------+--------+
! 447.1 ! 2144.4 ! 3748.0 ! 5311.2 ! gmp 6.2.1 (baseline with 1.24 Bios)
+--------+--------+--------+--------+
!  453.1 ! 2158.4 ! 3803.1 | 5403.0 ! gmp-6.2.99-20230706074853.tar.zst
+--------+--------+--------+--------+


Specifically, there seems to be something slightly off
with the multiply performance.

Above measurements seem to agree with your statement.


Unfortunately, I don't have Zen4 hardware, nor do I the time to optimise
GMP for Zen4 anytime soon.

Is there any doc or postings on how to do CPU specific optimizations for gmplib?


While determining sqrtm1="sqrt(-1) (mod p)" is really compute intensive for 100Ks digit primes, other computations are superfast. In this libgmpxx code x and y are provided as constants,
the sum of their squares being smallest known 1million digit prime:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.smallest_known_1million_digit_prime.cc

Determining sqrtm1 from x and y takes only 0.2s with GMP powm().
Determining sum of squares from sqrtm1 takes long with GMP gcd(),
but takes only 0.23s with GMP based PARI lib halfgcdii():
(I learned about that from Bill Allombert on pari-users mailing list)

hermann@7600x:~/RSA_numbers_factored/c++$ ./sqrtm1.smallest_known_1million_digit_prime
a = y^(-1) (mod p) [powm]; a *= x; a %= p
0.204478s
[M,V] = halfgcdii(sqrtm1, p)
0.226863s
[x,y] = [V[2], M[2,1]]
0s
done
hermann@7600x:~/RSA_numbers_factored/c++$


Regards,

Hermann Stamm-Wilbrandt.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to