Torbjorn Granlund <t...@gmplib.org> writes: > I looked at the logic following this: > > sbb U2, U2 C 7 13 > > You negate the U2 copy in Q2. It seems that three adc by sbb > could avoid the neg.
The problem is the final use, where Q2 is added, with carry, to a different register. It's tempting to replace adc Q1I, Q2 with sbb Q2, Q1I and negated Q2, but I'm afraid that will get the sense of the carry wrong. Do you see any trick to get that right without negating Q2 somewhere along the way? > I might also be possible to replace the early loop "and" stuff by > cmov. Maybe, but the simple way to do conditional addition with lea + cmov won't to, since we also need carry out. Does it matter if we do mov B2, r and mask, r or mov $0, r cmovc B2, r ? > To optimise register usage, I sometimes annotate the code with live > ranges for each register. That will help with register coalescing. There are lots of possibilities, since the computations for Q and U are mostly independent. The data flow is something like load U limb | _V_ U2, U1I, U0 -> |___| -> U2, U1O, U0 \ | ______/ cy _V__V___V_ Q1I, Q0-> |__________| -> Q1O, Q0 \ V store Q limb > (T is rather shot-lived, perhaps its register could serve two usages?) It could perhaps eliminated. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel