Some cleanups and tweaks later.  The gcd_33 based on this, compiled with
gcc 8.3, runs at 30 cycles per iteration.  (Note, not cycles per bit!)

My best gcd_33 in assembly runs at 10 cycles per iteration.

The former uses memory based operands.  The latter keeps everything in
registers.

If we wrote an assembly variant of this, and inlined sub_3 and rshift_3,
I expect it to run at about 15 cycles per iteration.

(Timings are for AMD Ryzen.)

Attachment: gcd-mpn.c
Description: Binary data

Attachment: x64-mpn_N.asm
Description: Binary data

-- 
Torbjörn
Please encrypt, key id 0xC8601622
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to