From: Torbjorn Granlund <[email protected]>
Date: Tue, 26 Mar 2013 21:18:26 +0100

> David Miller <[email protected]> writes:
> 
>   L(top):
>           or      %g4, %g1, %l1
>           sllx    %g2, cnt, %g1
>   
>           srlx    %g2, tcnt, %g4
>           ldx     [up - 8], %g2
>   
>           stx     %l1, [rp - 8]
>           or      %g3, %l2, %l7
>   
>           sllx    %g5, cnt, %l2
>           srlx    %g5, tcnt, %g3
>   
>           ldx     [up - 16], %g5
>           sub     up, 16, up
>   
>           stx     %l7, [rp - 16]
>           sub     rp, 16, rp
>   
>           brgz    n, L(top)
>            add    n, -2, n
>   
> It has lost some symmetry, which would be nice to keep.  Is it slower
> in the operation order I suggested?

In what was has symmetry been lost?  For odd modulus of 'n' we can
branch to the first instruction after the first store in the loop, and
it should work just fine.

The only thing I did was transpose some "or/sllx" pairs, I tried to
keep the major blocks grouped the same.
_______________________________________________
gmp-devel mailing list
[email protected]
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to