David Miller <da...@davemloft.net> writes: As an aside I think we can get it down to 2.5 cycles per limb on T4 with 4-way unrolling, and 3.0 cycles per limb with 2-way unrolling. The idea is to decrease the bookkeeping instructions by only maintaining base pointers which do not change, and then we have an offset which operates as the loop index. So we'd instead have an 'n_off' instead of 'n', and then in some local registers we'd hold: l3: up - 8 l4: up - 16 l5: rp - 8 l6: rp - 16 A clever trick! But you will probably get 2.75 c/l for 4-way, not 2.5 c/l. We'll need infinite unrolling for 2.5...
-- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel