ni...@lysator.liu.se (Niels Möller) writes:
The problem is the final use, where Q2 is added, with carry, to a
different register. It's tempting to replace
adc Q1I, Q2
with
sbb Q2, Q1I
and negated Q2, but I'm afraid that will get the sense of the carry
Torbjorn Granlund writes:
> I looked at the logic following this:
>
> sbb U2, U2 C 7 13
>
> You negate the U2 copy in Q2. It seems that three adc by sbb
> could avoid the neg.
The problem is the final use, where Q2 is added, with carry, to a
different register. It's temptin
I looked at the logic following this:
sbb U2, U2 C 7 13
You negate the U2 copy in Q2. It seems that three adc by sbb
could avoid the neg.
I might also be possible to replace the early loop "and" stuff by cmov.
Note that the carry flag survives dec, although that causes a pi
Torbjorn Granlund writes:
> On Intel chips, op-to-mem is expensive. Even op-from-memory is often
> slower than load+op. (I understand the register shortage problem.)
The following (untested) variant needs one register too many.
UP, QP, UN: Load, store, loop counter.
DINV, B2, B2
ni...@lysator.liu.se (Niels Möller) writes:
Will try that. I think one could also try to delay the quotient store
one iteration, keeping "Q1" in a register until the next iteration. Then
one gets rid of the
adc Q2,8(QP, UN, 8)
in the loop, using only a single store per it