GMP's gcd performance for two-limb operands is quite poor. The gcd_2 function of mpn/generic/gcd.c uses a loop with an randomly taken branch, and therefore is slow. This function is also not overridable in assembly code. It drops into mpn_gcd_1 except in unlikely cases, but mpn_gcd_1 is overkill and incurs unnecessary overhead as it accepts an n-limb operand.
We have mpn_gcd_1 in assembly for many machines, but no "mpn_gcd_11" which accepts just two 1-limb operands. What can we do? 1. Define mpn_gcd_11. 2. Add a new entry point mpn_gcd_11 to existing assembly mpn_gcd_1. 3. Allow for overrriding gcd_2 in assembly (and probably rename it gcd_22). 4. Implement gcd_22 in assembly for main CPU families. This is not much work. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel