t...@gmplib.org (Torbjörn Granlund) writes: > Perhaps gcd_22 should (conditionally) invoke (or tail call) gcd_11, > possibly it should let its caller invoke gcd_11 to finish up an > unfinished job.
We should run the gcd_22 main loop as long as all of u1, u0, v1, v0 are non-zero. After computing {t1, t0} <-- {v1,v0} - {u1,u0}, there are a couple of cases: 1. t1 == t0 == 0: Terminate. Will happen if and only of the gcd is larger than a single limb. 2. t0 == 0. Enter gcd_21 (se below), after some setup to shift out trailing zeros from |v1 - u1|. 3. Common case: Get absolute value and shift out trailing zeros. May produce a zero high limb, in that case fall through to gcd_21, otherwise, continue gcd_22 loop. (We could check for an early zero high limb between steps (2) and (3), i.e., check for (t1 == 0 and no underflow or t1 == ~0 and underflow from the subtraction). But most likely not worth the effort, since these cases are also taken care of by the high limb check in (3).) The gcd_21 loop takes inputs (u1, u0, v), inputs odd, and all the limbs non-zero. It is a much simpler loop than gcd_22 do { {u1, u0} <-- {u1, u0} - v (always > 0) if (u0 == 0) { u0 = u1; break; } shift out trailing zeros } while (u1 > 0) Not sure how many rounds it will typically run, but I'd guess typically few. When the gcd_21 loop exits, we're set for a call to gcd_11. > What should its input requirements be? Should it require one argument, > either argument, or both argument to be odd? If even inputs are allowed, > do we also allow zero input? > > To me it makes some sense to require that common factors of two are > taken care of at top-level, and at least rule out the case of two even > inputs to gcd_11. > > Agreed. Let's say that gcd(u,v) requires an odd u. What are the options for v? If we allow arbitrary v, then it could in principle be implemented as the tail recursive mp_limb_t gcd_11 (mp_limb_t u, mp_limb_t v) { if (v == 0) return u; count_leading_zeros(cnt, v); v >>= count; return gcd_11(min(u,v), |u-v|); } Or we say odd u, non-zero v, the check for zero is moved around, and we'll check for zero only in connection with the u-v subtraction, mp_limb_t gcd_11 (mp_limb_t u, mp_limb_t v) { count_leading_zeros(cnt, v); v >>= count; if (u == v) return u; return gcd_11(min(u,v), |u-v|); } If we also require v odd, then we'll just move the clz and shift a bit, with no really clear benefits. Pro: Avoids initial clz for uses where both inputs are known odd apriori. May be slightly more convenient for the C gcd_11 with masking tricks where initial clz and theone in the loop are a bit different. Cons: Moves the responsibility an initial clz out of the supposedly well optimized asm gcd_11 and out to C code. Call chain down from mpz_gcd with small inputs is particularly important. So I see no strong reason to do it one way or the other. Maybe start with requiring both u and v odd, and relax requirements later in case we find any more tangible benefit from doing that? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel