From: Torbjorn Granlund <[email protected]> Date: Tue, 26 Mar 2013 21:18:26 +0100
> David Miller <[email protected]> writes: > > L(top): > or %g4, %g1, %l1 > sllx %g2, cnt, %g1 > > srlx %g2, tcnt, %g4 > ldx [up - 8], %g2 > > stx %l1, [rp - 8] > or %g3, %l2, %l7 > > sllx %g5, cnt, %l2 > srlx %g5, tcnt, %g3 > > ldx [up - 16], %g5 > sub up, 16, up > > stx %l7, [rp - 16] > sub rp, 16, rp > > brgz n, L(top) > add n, -2, n > > It has lost some symmetry, which would be nice to keep. Is it slower > in the operation order I suggested? In what was has symmetry been lost? For odd modulus of 'n' we can branch to the first instruction after the first store in the loop, and it should work just fine. The only thing I did was transpose some "or/sllx" pairs, I tried to keep the major blocks grouped the same. _______________________________________________ gmp-devel mailing list [email protected] http://gmplib.org/mailman/listinfo/gmp-devel
