Torbjorn Granlund <t...@gmplib.org> writes:

> I looked at the logic following this:
>
>         sbb     U2, U2          C 7 13
>
> You negate the U2 copy in Q2.  It seems that three adc by sbb
> could avoid the neg.

The problem is the final use, where Q2 is added, with carry, to a
different register. It's tempting to replace

        adc     Q1I, Q2

with

        sbb     Q2, Q1I

and negated Q2, but I'm afraid that will get the sense of the carry
wrong. Do you see any trick to get that right without negating Q2
somewhere along the way?

> I might also be possible to replace the early loop "and" stuff by
> cmov.

Maybe, but the simple way to do conditional addition with lea + cmov
won't to, since we also need carry out.

Does it matter if we do

        mov     B2, r
        and     mask, r

or

        mov     $0, r
        cmovc   B2, r

?

> To optimise register usage, I sometimes annotate the code with live
> ranges for each register.  That will help with register coalescing.

There are lots of possibilities, since the computations for Q and U are
mostly independent. The data flow is something like

                      load U limb
                           |
                          _V_
          U2, U1I, U0 -> |___| -> U2, U1O, U0 
           \   |    ______/ cy
           _V__V___V_
Q1I, Q0-> |__________|  -> Q1O, Q0
                    \
                     V
               store Q limb

> (T is rather shot-lived, perhaps its register could serve two usages?)

It could perhaps eliminated.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to