From: ni...@lysator.liu.se (Niels Möller) Date: Fri, 04 Jan 2013 08:48:21 +0100
> David Miller <da...@davemloft.net> writes: > >> Just FYI, I'm also working on an mpn_mul_basecase that makes use of >> the T4 'mpmul' instruction which can do NxN 64-bit limb multiplies >> for values of N from 1 to 32. > > It might make sense to experiment with an mpn_addmul_2 before doing > mpn_mul_basecase. The tradeoff of when mpmul is faster than a flat-out mulx/umulxhi loop is beyond 2x2 limbs, so I don't see any value in looking into that just yet. There's a lot of setup and teardown associated with using mpmul because it uses several register windows and some of the floating point registers to hold the entire set of inputs, and to provide the result. That's why realistically I'll probably only use mpmul for 3x3 and larger. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel