I played more with the code, now trying to break the add-adc-sbb-cmov
chain, for the benefit of most Intel processors.

But I lack unit testing code for the function, making hacking quite
cumbersome.  I don't feel safe hacking *any* GMP assembly code without
tests/devel/try.c's function and access checks.

The changes I wanted to try was:

(1) Shorten a dep chain, and avoid keeping CF live over an inc
    instruction.  The cmov doesn't really depend on sbb, since the
    latter insn never really changes carry.  (This might btw be useful
    to teach loppmixer!)

(2) Reallocate Q2 to an "old" register (not r8-r15) and then use the
    32-bit form of "adc $0,reg".  That form is shorter.

(3) Offet UP to avoid the offset in the loop.  That form has longer load
    latency for some Intel CPUs.  Also try non-indexed form for QP and UP.

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to