I switched focus to a general mpn_gcd_basecase, using the table-based approach Niels and me discussed a week ot two ago here.
I decided to still keep away from data-dependent branches, which may not be the best solution as the operands grow. Ignoring unlikely cases, we have this loop: while (n > 2) { CMP_SWAP (up, vp, n); a = up[0]; b = vp[0]; m = tab[((a & mask) << (4 - 1)) + ((b & mask) >> 1)]; cy = mpn_addmul_1 (up, vp, n, m); a = up[0]; count_trailing_zeros (cnt, a); mpn_rshift (up, up, n, cnt); up[n - 1] |= cy << (GMP_LIMB_BITS - cnt); if (mpn_cmp (up, vp, n) == 0) we're done n -= (up[n - 1] | vp[n - 1]) == 0; } This isn't a great loop. It passes through the operands twice (addmul_1 and rshift, the cmp typically exits immediately). It also uses the only-positive-elements variant of the table, which means much poorer progress. How could it be improved? If we allow data-dependent jumps, we could use a more optimal table. We'd then conditionally call addmul_1 or submul_1. The latter would sometimes yield negative results. We could avoid those by having an "rsbmul_1". We could use a sloppy way of choosing between submul_1 and rsbmul_1, and fixup poor choices when we occasionally got a negative result. Data-dependent jumps will behaps be OK for n more than, say, n=5. I'd like a single run through the data with 3-epsilon bits of progress per iteration, and no branches for typical iteration. We might be able to get rid of rshift by scaling the multiplier m, i.e., if effect let operands' low bit be at a distance from the lowest limb bit. We don't need to limit ourselves to using *mul_1 loops. We could use something slightly more advanced. These algorithms will be useful for CPUs with fast *mul_1 loops. Amd Ryzen and Intel *lake all have such loops which run at less than 2 cycles/limb. Ideas are welcome! Great, working code is even more welcome! Here is the full code for the basic algorithm outlined above: void mpn_gcd_basecase (mp_limb_t *rp, mp_limb_t *up, mp_limb_t *vp, mp_size_t n) { static unsigned char tab[256] = { 31, 21, 19, 9, 7, 29, 27, 17, 15, 5, 3, 25, 23, 13, 11, 1, 29, 31, 25, 27, 21, 23, 17, 19, 13, 15, 9, 11, 5, 7, 1, 3, 27, 9, 31, 13, 3, 17, 7, 21, 11, 25, 15, 29, 19, 1, 23, 5, 25, 19, 5, 31, 17, 11, 29, 23, 9, 3, 21, 15, 1, 27, 13, 7, 23, 29, 11, 17, 31, 5, 19, 25, 7, 13, 27, 1, 15, 21, 3, 9, 21, 7, 17, 3, 13, 31, 9, 27, 5, 23, 1, 19, 29, 15, 25, 11, 19, 17, 23, 21, 27, 25, 31, 29, 3, 1, 7, 5, 11, 9, 15, 13, 17, 27, 29, 7, 9, 19, 21, 31, 1, 11, 13, 23, 25, 3, 5, 15, 15, 5, 3, 25, 23, 13, 11, 1, 31, 21, 19, 9, 7, 29, 27, 17, 13, 15, 9, 11, 5, 7, 1, 3, 29, 31, 25, 27, 21, 23, 17, 19, 11, 25, 15, 29, 19, 1, 23, 5, 27, 9, 31, 13, 3, 17, 7, 21, 9, 3, 21, 15, 1, 27, 13, 7, 25, 19, 5, 31, 17, 11, 29, 23, 7, 13, 27, 1, 15, 21, 3, 9, 23, 29, 11, 17, 31, 5, 19, 25, 5, 23, 1, 19, 29, 15, 25, 11, 21, 7, 17, 3, 13, 31, 9, 27, 3, 1, 7, 5, 11, 9, 15, 13, 19, 17, 23, 21, 27, 25, 31, 29, 1, 11, 13, 23, 25, 3, 5, 15, 17, 27, 29, 7, 9, 19, 21, 31 }; const mp_limb_t mask = (2 << 4) - 2; mpn_zero (rp, n); while (n > 2) { mp_limb_t a, b, m, cy; int cnt; CMP_SWAP (up, vp, n); a = up[0]; b = vp[0]; m = tab[((a & mask) << (4 - 1)) + ((b & mask) >> 1)]; cy = mpn_addmul_1 (up, vp, n, m); a = up[0]; if (UNLIKELY (a == 0)) { int all_zeros = 1; mp_size_t i; for (i = 1; i < n; i++) { if (up[i] != 0) { all_zeros = 0; break; } } if (all_zeros) { ASSERT_ALWAYS (cy != 0); count_trailing_zeros (cnt, cy); up[0] = cy >> cnt; mpn_zero (up + 1, n - 1); } else { mpn_copyi (up, up + i, n - i); up[n - i] = cy; i -= cy != 0; mpn_zero (up + n - i, i); ASSERT_ALWAYS (up[0] != 0); if ((up[0] & 1) == 0) { int cnt; count_trailing_zeros (cnt, up[0]); mpn_rshift (up, up, n - i, cnt); } } } else { count_trailing_zeros (cnt, a); mpn_rshift (up, up, n, cnt); ASSERT_ALWAYS (cnt); up[n - 1] |= cy << (GMP_LIMB_BITS - cnt); } if (mpn_cmp (up, vp, n) == 0) { mpn_copyi (rp, up, n); return; } n -= (up[n - 1] | vp[n - 1]) == 0; } mp_double_limb_t g = mpn_gcd_22 (up[1], up[0], vp[1], vp[0]); rp[0] = g.d0; rp[1] = g.d1; } -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel