"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: > ... the effect is that in many cases (if U don't need reduction modulo V) > the trailing zeros of U are removed twice.
But you may be right that it's important for performance to avoid a redundant count_trailing_zeros on u. It seems tricky to avoid that, without code duplication or complications in the code flow. For my personal preference, tradeoff threshold between goto and duplicated code is around 5-10 duplicated code lines. Thinking about micro optimizations... Consider > count_trailing_zeros (c, ulimb); > ulimb = (ulimb >> 1) >> c; vs > count_trailing_zeros (c, ulimb); > ulimb >>= (c + 1); I wonder if maybe it's preferable to always use the first variant? Both should compile to three instructions (assuming we have a ctz instruction), ctz + shift + shift, or ctz + add + shift. But with the first variant, the first shift can be issued in parallel with ctz, reducing the critical path of the gcd loop. We'd need some benchmarking to see if it makes a difference. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-bugs mailing list gmp-bugs@gmplib.org https://gmplib.org/mailman/listinfo/gmp-bugs