"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes:

> ... the effect is that in many cases (if U don't need reduction modulo V)
> the trailing zeros of U are removed twice.

But you may be right that it's important for performance to avoid a
redundant count_trailing_zeros on u.

It seems tricky to avoid that, without code duplication or complications
in the code flow. For my personal preference, tradeoff threshold between
goto and duplicated code is around 5-10 duplicated code lines.

Thinking about micro optimizations... Consider

>       count_trailing_zeros (c, ulimb);
>       ulimb = (ulimb >> 1) >> c;

vs

>       count_trailing_zeros (c, ulimb);
>       ulimb >>= (c + 1);

I wonder if maybe it's preferable to always use the first variant? Both
should compile to three instructions (assuming we have a ctz
instruction), ctz + shift + shift, or ctz + add + shift. But with the
first variant, the first shift can be issued in parallel with ctz,
reducing the critical path of the gcd loop.

We'd need some benchmarking to see if it makes a difference.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-bugs mailing list
gmp-bugs@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-bugs

Reply via email to