Some cleanups and tweaks later. The gcd_33 based on this, compiled with gcc 8.3, runs at 30 cycles per iteration. (Note, not cycles per bit!)
My best gcd_33 in assembly runs at 10 cycles per iteration. The former uses memory based operands. The latter keeps everything in registers. If we wrote an assembly variant of this, and inlined sub_3 and rshift_3, I expect it to run at about 15 cycles per iteration. (Timings are for AMD Ryzen.)
gcd-mpn.c
Description: Binary data
x64-mpn_N.asm
Description: Binary data
-- Torbjörn Please encrypt, key id 0xC8601622
_______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel