From: Torbjorn Granlund <t...@gmplib.org> Date: Sun, 06 Jan 2013 13:48:42 +0100
> I recommend 4-way unrolling. > > The summation method of mpn/powerpc64/mode64/aorsmul_1.asm might be > best. Thanks for all of these pointers and suggestions. While waiting for the FSF to execute my assignment, I tweaked my existing 2-way unrolled mul_1 and addmul_1 loops. Currently on T4 I'm at: mul_1 3.8 cycles/limb L(top): ldx [up+0], %g1 sub n, 2, n ldx [up+8], %o4 mulx %g1, v0, %g3 add up, 16, up umulxhi %g1, v0, %g2 mulx %o4, v0, %g1 add rp, 16, rp addxccc %g3, %o5, %g3 umulxhi %o4, v0, %o5 stx %g3, [rp-16] addxccc %g1, %g2, %g1 brgz n, L(top) stx %g1, [rp-8] addmul_1 5.5 cycles/limb L(top): ldx [up+0], %l0 ldx [up+8], %l1 ldx [rp+0], %l2 ldx [rp+8], %l3 mulx %l0, v0, %o0 add up, 16, up umulxhi %l0, v0, %o1 add rp, 16, rp mulx %l1, v0, %o2 sub n, 2, n umulxhi %l1, v0, %o3 addxccc %o0, %o5, %o0 addxccc %o2, %o1, %o2 addxc %g0, %o3, %o5 addcc %o0, %l2, %o0 stx %o0, [rp-16] addxccc %o2, %l3, %o2 brgz n, L(top) stx %o2, [rp-8] Just FYI... _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel