ni...@lysator.liu.se (Niels Möller) writes: You could always do the two's complement of one of the operands on the fly, and then use the same add with carry instructions as in add_n. I'm thinking aloud, so I'm sorry if I get this wrong, but I think it's best to handle the unlikely case of low zero limbs up front. Then it's a plain negate of the first non-zero limb, and a plain complement for the remaining limbs; the important thing here is that the negation generates no additional carries to propagate. So compared to add_n, you just get an additional xor with -1 in the loop (and not on the loop's critical path). I can't guess whether or not that will be visible in the execution time. For sub_n, I suppose
ldx ldx xnor (with %g0) addxcc stx would be the right mix. This should run in 2.5 + epsilon c/l, if properly software pipelined. 4x should give 2.75 c/l, unless they stick some pipeline bubbles for taken branches. For add_n, things should run 0.5 c/l faster. (I am assuming it is a 2-way pipeline.) -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel