ni...@lysator.liu.se (Niels Möller) writes: > Checking u1 = 0 and v1 = 0 separately as you suggest is a different > thing, and it might not have zero cost in the gcd_22 loop.
I think only the shifted number should be checked, and as the main loop exit condition. OK, so we agree there! If both u1 > 0 and v1 > 0 on entry (if gcd_22 require that is unclear), then u1 == 0 impliex v1 > 0. I don't think gcd_22 should misbehave for v1 = u1 = 0. It currently is designed to exit for such operands in order to leave things for gcd_11. But wait, I think my current code might fail for u1 = 0, u0 = small, v1 = v0 = 0xfff.fff. That would result in the upper half difference being 0 with carry out. The zero will trigger loop exit currently. Fixable, but we should make sure to test for that (at least if u1 = 0 is allowed). > We could do a large rightshift outside the loop and then jump back into > and (ab)use gcd_22 with u1 = 0 xor v1 = 0. I suspect random operands > will not see any timing difference. Don't you agree that u1 / v1 will > not be too far from 1.0 on average? That would probably work fine, but maybe not simpler than an explicit gcd_22. So then the normal case would call gcd_22 with u1 > 0, v1 > 0, and exit the loop with u1 > 0, v1 == 0. I don't see why we'd need a right shift here, I think one could just check if u1 > 0 and if so jump back to gcd_22, and next time the loop exits we will have u1 = 0, v1 = 0. What do you mean by an "explicit gcd_22"? What I am saying is that we should not design and validate more loops that justified. While gcd_21 would be sleeker than gcd_22, it might not speed up non-contrived examples as after gcd_22 exits because it has too (the case v0 = 0 or u0 = 0) we have an unlikely scenario. If we let it exit if v1 = 0 *or* u1 = 0, as opposed to v1 = 0 and u1 = 0, then we might need gcd_21 in some form, but it will rarely run for more than 1 or two iterations (not enough to train its loop branch to be rightly predicted). I think an explicit gcd_21 loop may be clearer, but unlikely to matter for performance. I'd like to avoid another asm loop if possible. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel