Re: speed of unbalanced multiplication
Marco, Date: Sat, 26 Jan 2013 16:21:28 +0100 (CET) From: bodr...@mail.dm.unipi.it Ciao, Il Sab, 26 Gennaio 2013 4:01 pm, bodr...@mail.dm.unipi.it ha scritto: I mean, which timing do you obtain with the following? ./speed -s $((100+775660)/2) mpn_mul_n mpn_mul_n Sorry... I mean: ./speed -s $[(100+775660)/2] mpn_mul_n mpn_mul_n here is another example: frite% ./speed -s 60 mpn_mul.100 mpn_mul.100 mpn_mul.100 overhead 0.2 secs, precision 1 units of 3.33e-10 secs, CPU freq 3000.00 MHz mpn_mul.100 mpn_mul.100 mpn_mul.100 60 #0.716044000 0.720045000 0.716045000 frite% ./speed -s 80 mpn_mul_n mpn_mul_n mpn_mul_n overhead 0.2 secs, precision 1 units of 3.33e-10 secs, CPU freq 3000.00 MHz mpn_mul_n mpn_mul_n mpn_mul_n 800.676042000 0.680043000 #0.672042000 In the FTT range, multiplying n limbs by m limbs should not be more expensive then multiplying two numbers of (n+m)/2 limbs. In that case, we get a 6.5% slowdown with respect to that strategy. Note that I did not try to get the largest possible slowdown. Paul ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: Toom sqr recursion
Torbjorn Granlund t...@gmplib.org writes: I just spotted that we're not depending on SQR_BASECASE_THRESHOLD in any of the recursive toomN_sqr calls. I think we should, at least from toom2_sqr. And in toom44_mul.c and toom4_sqr.c we set MAYBE_*_toom4* using MUL_FFT_THRESHOLD when we should really consider TOOM6 or (if TOOM6 = 0) TOOM8. This was forgotten when the 6H and 8H code was merged. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Re: Toom sqr recursion
Torbjorn Granlund t...@gmplib.org writes: And in toom44_mul.c and toom4_sqr.c we set MAYBE_*_toom4* using MUL_FFT_THRESHOLD when we should really consider TOOM6 or (if TOOM6 = 0) TOOM8. This was forgotten when the 6H and 8H code was merged. Here's is a possible patch set: diff -Nrc2 gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom44_mul.c gmp-powm-powm_ui/mpn/generic/toom44_mul.c *** gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom44_mul.c Sun Jan 27 22:05:24 2013 --- gmp-powm-powm_ui/mpn/generic/toom44_mul.c Sun Jan 27 22:05:24 2013 *** *** 51,54 --- 51,61 #define MAYBE_mul_toom44 1 #else + #if MUL_TOOM6H_THRESHOLD != 0 + #define MUL_NEXTALG_THRESHOLD MUL_TOOM6H_THRESHOLD + #elif MUL_TOOM8H_THRESHOLD != 0 + #define MUL_NEXTALG_THRESHOLD MUL_TOOM8H_THRESHOLD + #else + #define MUL_NEXTALG_THRESHOLD MUL_FFT_THRESHOLD + #endif #define MAYBE_mul_basecase\ (MUL_TOOM44_THRESHOLD 4 * MUL_TOOM22_THRESHOLD) *** *** 56,60 (MUL_TOOM44_THRESHOLD 4 * MUL_TOOM33_THRESHOLD) #define MAYBE_mul_toom44 \ ! (MUL_FFT_THRESHOLD = 4 * MUL_TOOM44_THRESHOLD) #endif --- 63,67 (MUL_TOOM44_THRESHOLD 4 * MUL_TOOM33_THRESHOLD) #define MAYBE_mul_toom44 \ ! (MUL_NEXTALG_THRESHOLD = 4 * MUL_TOOM44_THRESHOLD) #endif diff -Nrc2 gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom4_sqr.c gmp-powm-powm_ui/mpn/generic/toom4_sqr.c *** gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom4_sqr.c Sun Jan 27 22:05:24 2013 --- gmp-powm-powm_ui/mpn/generic/toom4_sqr.cSun Jan 27 22:05:24 2013 *** *** 48,51 --- 48,58 #define MAYBE_sqr_toom4 1 #else + #if SQR_TOOM6H_THRESHOLD != 0 + #define SQR_NEXTALG_THRESHOLD SQR_TOOM6H_THRESHOLD + #elif SQR_TOOM8H_THRESHOLD != 0 + #define SQR_NEXTALG_THRESHOLD SQR_TOOM8H_THRESHOLD + #else + #define SQR_NEXTALG_THRESHOLD SQR_FFT_THRESHOLD + #endif #define MAYBE_sqr_basecase\ (SQR_TOOM4_THRESHOLD 4 * SQR_TOOM2_THRESHOLD) *** *** 53,57 (SQR_TOOM4_THRESHOLD 4 * SQR_TOOM3_THRESHOLD) #define MAYBE_sqr_toom4 \ ! (SQR_FFT_THRESHOLD = 4 * SQR_TOOM4_THRESHOLD) #endif --- 60,64 (SQR_TOOM4_THRESHOLD 4 * SQR_TOOM3_THRESHOLD) #define MAYBE_sqr_toom4 \ ! (SQR_NEXTALG_THRESHOLD = 4 * SQR_TOOM4_THRESHOLD) #endif -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel
Extending the mpn interface
I'm currently working on some cryptograpic elliptic curve code, using gmp for underlying bignum numbers (of moderate size, largest operation is multiply with 17 limb inputs). I think I want to keep using mpz_t for user-visible functions, but I'm leaning towards using the mpn interface internally, for two main reasons: * I want to get control over the timing of operations, to avoid timing side-channels * I'd like to write optimized code for computing mod p for certain fixed p with nice structure, and those functions have to work on mpn-representation. But I miss some features in the public interface to do this conveniently; both for mpn - mpz interoperation, and there are also some mpn functions which would be very useful but which are not part of the public interface. For the first issue, it would be nice to have 1. A way to extract an mpn number (pointer and length) from an mpz_t, for read-only use. I.e., getting the _mp_d and _mp_size. 2. A way to extract an mpn number from an mpz_t, for writing. User has to provide maximum size, and then we also need a way to update the actual size when done. Basically, this is MPZ_REALLOC/MPZ_NEWALLOC and MPN_NORMALIZE. 3. A way to construct an mpz_t from an mpn, for read-only use. A bit like MPZ_TMP_INIT, but preferably with a macro expanding to an initializer, something like #define MPZ_FROM_MPN(p, n) {{ 0, (n), (p) }} mpz_t x = MPZ_FROM_MPN (xp, xn); Then this could be used also for compile-time constant mpz values. Currently, the only really kosher way to do this is to go via mpz_export and mpz_import, which isn't particularly convenient, and which also adds a bit more overhead than I'd like. What do you think? I think such interfaces can be defined to be safe across gmp releases. It's ok with function-call overhead for (1) and (2) and if necessary even for (3), but it would be ideal if they can be defined without any extra overhead, so that the same interfaces can be used internally in GMP. For the second issue, here are some functions which I'm missing in the public interface: * mpn_mul_basecase (possibly under some other name, like mpn_mul_small or mpn_mul_sec). The most important thing is to have a side-channel silent multiply, but it would be nice to also avoid overhead which is needed only for largish numbers. * mpn_addlsh_n, with fallback to mpn_addmul_1, if there's no special loop for this. Possibly also for other variants of shift-and-add. * mpn_addcnd_n and mpn_subcnd_n (BTW, I wonder if the current C implementation is faster or slower than calling assembly mpn_addmul_1?). Possibly also mpn_addcnd_nc and mpn_subcnd_nc. * mpn_powm_sec. * mpn_add_nc, mpn_sub_nc (would it be difficult to ensure these are defined on all platforms)? I think these functions are sufficiently well-understood that we can design and commit to a public interface. And a couple of functions I would need are missing: * mpn_copycnd, like void mpn_copycnd (int cnd, mp_limb_t *rp, const mp_limb_t *ap, mp_size_t n) { mp_limb_t mask, keep; mp_size_t i; mask = -(mp_limb_t) (cnd !=0); keep = ~mask; for (i = 0; i n; i++) rp[i] = (rp[i] keep) + (ap[i] mask); } (related to mpn_tabselect). * mpn_add_1_sec and mpn_sub_1_sec. O(n) operations, always propagating carry all the way. I you think some or all of these extensions of the mpn interface make sense, I may be able to spend some time on this. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel