ni...@lysator.liu.se (Niels Möller) writes: We should run the gcd_22 main loop as long as all of u1, u0, v1, v0 are non-zero. After computing {t1, t0} <-- {v1,v0} - {u1,u0}, there are a couple of cases:
My main perspective wasn't algorithmic but about code organisation. "Were to do what." Let's compare what you suggest algorithm wise compared to what I implemented. My code stops if either u0-v0 or u1-v1-cy becomes zero (where cy is assigned from the first subtraction). If u0-v0 becomes zero, either u0 or v0 will become zero in the next iteration, of course. My code then bails out as it cannot handle a rightshift of a full limb. The bailout state is essentially a gcd_21 situation. So your suggested criteria and my implemented criteria seem very similar here. Checking u1 = 0 and v1 = 0 separately as you suggest is a different thing, and it might not have zero cost in the gcd_22 loop. If many iterations are expected before one has u1 = v1 = 0, then this is worth a separate loop. (Or perhaps a hardware 2/1 division, but its quotient does not necessarily fit in a single limb!) The gcd_21 loop takes inputs (u1, u0, v), inputs odd, and all the limbs non-zero. It is a much simpler loop than gcd_22 We could do a large rightshift outside the loop and then jump back into and (ab)use gcd_22 with u1 = 0 xor v1 = 0. I suspect random operands will not see any timing difference. Don't you agree that u1 / v1 will not be too far from 1.0 on average? Let's say that gcd(u,v) requires an odd u. What are the options for v? If we allow arbitrary v, then it could in principle be implemented as the tail recursive mp_limb_t gcd_11 (mp_limb_t u, mp_limb_t v) { if (v == 0) return u; count_leading_zeros(cnt, v); v >>= count; return gcd_11(min(u,v), |u-v|); } Cute. But I think we should reuse the fine tuned loops of our many gcd_1.asm. Perhaps we could arrange to enter these loops at a point where we allow for one operand being even. But that would cause a very slight delay in getting things going in the common case of two odd operands (as you mention). So I see no strong reason to do it one way or the other. Maybe start with requiring both u and v odd, and relax requirements later in case we find any more tangible benefit from doing that? Makes sense. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel