I played more with the code, now trying to break the add-adc-sbb-cmov chain, for the benefit of most Intel processors.
But I lack unit testing code for the function, making hacking quite cumbersome. I don't feel safe hacking *any* GMP assembly code without tests/devel/try.c's function and access checks. The changes I wanted to try was: (1) Shorten a dep chain, and avoid keeping CF live over an inc instruction. The cmov doesn't really depend on sbb, since the latter insn never really changes carry. (This might btw be useful to teach loppmixer!) (2) Reallocate Q2 to an "old" register (not r8-r15) and then use the 32-bit form of "adc $0,reg". That form is shorter. (3) Offet UP to avoid the offset in the loop. That form has longer load latency for some Intel CPUs. Also try non-indexed form for QP and UP. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel