Ciao, Il Ven, 4 Gennaio 2013 10:07 am, David Miller ha scritto: > mpmul 3 ! The immediate field is "N - 1"
Does the immediate means that, to write e.g. sqr_basecase (it should be far simpler than writing mul_basecase), you need a branch for each different N? > The circuit does scale very well, for example here are cycle counts > for just the 'mpmul' instruction itself for N from 1 to 16: > 79 > 84 [...] > 318 > 350 (Almost) exactly 78 + N + N*N, a big latency plus a quadratic algorithm. When 2n > 26, we have (78 + 2n + 2n*2n) > 3*(78 + n + n*n)... I'm curious to see where the Karatsuba threshold will be. > Anyways I obviously have a lot of experimenting and tinkering to do, > so I'll come back here once I have a better idea of how we might use > 'mpmul' most effectively. We hope to see you soon ;-) Il Ven, 4 Gennaio 2013 3:54 am, David Miller ha scritto: > Yes, the mpmul instruction is limited to balanced NxN multiplies. > > Well, actually, we could use this mpmul instruction for NxM cases by > padding the unused parameters with zeros. That way we could support > any case where N <= 32 and M <= 32. It seems better to perform a single 8x4 product (8x8, 150 cycles) than two 4x4 (98+98 cycles); also a 12x4 (12x12, 234 cycles) seems better than three 4x4 (98+98+98= 294)... maybe mul_4 is not worth writing, but mul_6 is... > Making this for crypto would be of no value for T4, because as > mentioned the chip has other instructions that more directly support > modular arithmetic in the form of 'montmul' and 'montsqr' > instructions. Cryptography beyond 2048-bits is possible :-) -- http://bodrato.it/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel