I'm having problems with timing of the gcd_11 code. Unfortunately, the nested macros of speed.h make things hard to read. Could yo double-check that operands to gcd_11 are odd and full limbs?
The odd thing is that gcd_1 seems to outperform gcd_11 in some 1 x 1 cases. That could happen I suppose through gcd_1's initial reduction (which look different in different .asm files.). Or it could happen if operands are not odd or if they have different bit counts. ... similar for testing gcd_22. Speaking of gcd_22. We need to determine this function's interface. I suppose it will contain 2 or 3 loops, depending on arch. The first loop will be 22. If the GCD is two limbs, it will finish the jobs. Else it will invoke either of the following loops. A possible middle loop will be 21. The last loop will be 11. We can simply inline a copy here as it is tiny. (A tail call won't work as the functions will have different return types.) -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel