t...@gmplib.org (Torbjörn Granlund) writes: > I think we might get more speed from this algorithm: > > while (highlimb(U) | highlimb(V)) != 0 > S = U - V > T = V - U > cnt = count_trailing_zeros(lowlimb(S)) > cnd1 = U < V > cnd2 = highlimb(U) < highlimb(V) > if (cnd2) > V = U > U = T > else > U = S > if (cnd1 != cnd2) { fix mistake! } > U >>= cnt; > > Why would it be faster? Because while cnd1 needs to wait for the > previous iteration's double-limb shifting, cnd2 depends on just the > single-limb shifting of the high U world.
Interesting. I did something related in the u±v gcd_22 variant, for other reasons. We probably don't need any fixup for the V assignment, I would expect that V <-- min(U,V) can be replaced with V <-- (highlimb(U) < highlimb(V)) ? U : V with no measurable impact on performance. Since it will make a difference for at most one iteration, and we still make reasonable progress when it happens. --- a/mpn/generic/gcd_22.c Sat Aug 17 00:59:04 2019 +0200 +++ b/mpn/generic/gcd_22.c Tue Aug 20 07:51:25 2019 +0200 @@ -52,7 +53,7 @@ mpn_gcd_22 (mp_limb_t u1, mp_limb_t u0, { mp_limb_t vgtu, t1, t0; sub_ddmmss (t1, t0, u1, u0, v1, v0); - vgtu = LIMB_HIGHBIT_TO_MASK(t1); + vgtu = LIMB_HIGHBIT_TO_MASK(u1 - v1); if (UNLIKELY (t0 == 0)) { @@ -91,6 +92,9 @@ mpn_gcd_22 (mp_limb_t u1, mp_limb_t u0, since t0 != 0. */ u0 = (t0 ^ vgtu) - vgtu; u1 = t1 ^ vgtu; + if (UNLIKELY ((mp_limb_signed_t) u1 < 0)) + { + u0 = ~u0; + u1 = -u1; + } if (UNLIKELY (c == GMP_LIMB_BITS)) { u0 = u1; I take it the idea is to have an unlikely branch for this case; attempting a branchless correction of vgtu would miss the point? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel