From: ni...@lysator.liu.se (Niels Möller) Date: Fri, 04 Jan 2013 09:10:30 +0100
> David Miller <da...@davemloft.net> writes: > >> That's why realistically I'll probably only use mpmul for 3x3 and >> larger. > > So, e.g., an mpn_addmul_4 would make sense (and up to mpn_addmul_32, if > you want to make maximal use of mpmul...)? I don't know anything about > these sparc instructions beyond what you're explaining now, but it > sounds like it could be advantageous to have one operand invariant in > the loop. I still think using it in mpn_mul_basecase when N==M is still going to be the best usage of this instruction. The issue is that every time you want to use 'mpmul' you have to do something like (this is for a 4x4 limb multiply): ldx [MULTIPLIER + 0x00], %o0 ldx [MULTIPLIER + 0x08], %o1 ldx [MULTIPLIER + 0x10], %o2 ldx [MULTIPLIER + 0x18], %o3 ... save %sp, -176, %sp ldx [MULTIPLICAND + 0x00], %l0 ldx [MULTIPLICAND + 0x08], %l1 ldx [MULTIPLICAND + 0x10], %l2 ldx [MULTIPLICAND + 0x18], %l3 ... save %sp, -176, %sp save %sp, -176, %sp save %sp, -176, %sp save %sp, -176, %sp save %sp, -176, %sp mpmul 3 ! The immediate field is "N - 1" restore restore restore restore stx %l0, [PRODUCT + 0x00] stx %l1, [PRODUCT + 0x08] stx %l2, [PRODUCT + 0x10] stx %l3, [PRODUCT + 0x18] stx %l4, [PRODUCT + 0x20] stx %l5, [PRODUCT + 0x28] stx %l6, [PRODUCT + 0x30] stx %l7, [PRODUCT + 0x38] restore restore The circuit does scale very well, for example here are cycle counts for just the 'mpmul' instruction itself for N from 1 to 16: 79 84 90 98 108 120 134 150 168 188 210 234 260 288 318 350 Anyways I obviously have a lot of experimenting and tinkering to do, so I'll come back here once I have a better idea of how we might use 'mpmul' most effectively. Thanks. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel