Re: speed of unbalanced multiplication

2013-01-27 Thread Zimmermann Paul
   Marco,

 Date: Sat, 26 Jan 2013 16:21:28 +0100 (CET)
 From: bodr...@mail.dm.unipi.it
 
 Ciao,
 
 Il Sab, 26 Gennaio 2013 4:01 pm, bodr...@mail.dm.unipi.it ha scritto:
  I mean, which timing do you obtain with the following?
  ./speed -s $((100+775660)/2) mpn_mul_n mpn_mul_n
 
 Sorry... I mean:
 
 ./speed -s $[(100+775660)/2] mpn_mul_n mpn_mul_n

here is another example:

frite% ./speed -s 60 mpn_mul.100 mpn_mul.100 mpn_mul.100
overhead 0.2 secs, precision 1 units of 3.33e-10 secs, CPU freq 
3000.00 MHz
mpn_mul.100 mpn_mul.100 mpn_mul.100
60   #0.716044000   0.720045000   0.716045000

frite% ./speed -s 80 mpn_mul_n mpn_mul_n mpn_mul_n
overhead 0.2 secs, precision 1 units of 3.33e-10 secs, CPU freq 
3000.00 MHz
mpn_mul_n mpn_mul_n mpn_mul_n
800.676042000   0.680043000  #0.672042000

In the FTT range, multiplying n limbs by m limbs should not be more expensive
then multiplying two numbers of (n+m)/2 limbs.

In that case, we get a 6.5% slowdown with respect to that strategy.

Note that I did not try to get the largest possible slowdown.

Paul

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: Toom sqr recursion

2013-01-27 Thread Torbjorn Granlund
Torbjorn Granlund t...@gmplib.org writes:

  I just spotted that we're not depending on SQR_BASECASE_THRESHOLD in any
  of the recursive toomN_sqr calls.  I think we should, at least from
  toom2_sqr.

And in toom44_mul.c and toom4_sqr.c we set MAYBE_*_toom4* using
MUL_FFT_THRESHOLD when we should really consider TOOM6 or (if TOOM6 = 0)
TOOM8.  This was forgotten when the 6H and 8H code was merged.

-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Re: Toom sqr recursion

2013-01-27 Thread Torbjorn Granlund
Torbjorn Granlund t...@gmplib.org writes:

  And in toom44_mul.c and toom4_sqr.c we set MAYBE_*_toom4* using
  MUL_FFT_THRESHOLD when we should really consider TOOM6 or (if TOOM6 = 0)
  TOOM8.  This was forgotten when the 6H and 8H code was merged.

Here's is a possible patch set:

diff -Nrc2 gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom44_mul.c 
gmp-powm-powm_ui/mpn/generic/toom44_mul.c
*** gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom44_mul.c  Sun Jan 27 
22:05:24 2013
--- gmp-powm-powm_ui/mpn/generic/toom44_mul.c   Sun Jan 27 22:05:24 2013
***
*** 51,54 
--- 51,61 
  #define MAYBE_mul_toom44   1
  #else
+ #if MUL_TOOM6H_THRESHOLD != 0
+ #define MUL_NEXTALG_THRESHOLD MUL_TOOM6H_THRESHOLD
+ #elif MUL_TOOM8H_THRESHOLD != 0
+ #define MUL_NEXTALG_THRESHOLD MUL_TOOM8H_THRESHOLD
+ #else
+ #define MUL_NEXTALG_THRESHOLD MUL_FFT_THRESHOLD
+ #endif
  #define MAYBE_mul_basecase\
(MUL_TOOM44_THRESHOLD  4 * MUL_TOOM22_THRESHOLD)
***
*** 56,60 
(MUL_TOOM44_THRESHOLD  4 * MUL_TOOM33_THRESHOLD)
  #define MAYBE_mul_toom44  \
!   (MUL_FFT_THRESHOLD = 4 * MUL_TOOM44_THRESHOLD)
  #endif
  
--- 63,67 
(MUL_TOOM44_THRESHOLD  4 * MUL_TOOM33_THRESHOLD)
  #define MAYBE_mul_toom44  \
!   (MUL_NEXTALG_THRESHOLD = 4 * MUL_TOOM44_THRESHOLD)
  #endif
  
diff -Nrc2 gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom4_sqr.c 
gmp-powm-powm_ui/mpn/generic/toom4_sqr.c
*** gmp-powm-powm_ui.8ec61b8b6882/mpn/generic/toom4_sqr.c   Sun Jan 27 
22:05:24 2013
--- gmp-powm-powm_ui/mpn/generic/toom4_sqr.cSun Jan 27 22:05:24 2013
***
*** 48,51 
--- 48,58 
  #define MAYBE_sqr_toom4   1
  #else
+ #if SQR_TOOM6H_THRESHOLD != 0
+ #define SQR_NEXTALG_THRESHOLD SQR_TOOM6H_THRESHOLD
+ #elif SQR_TOOM8H_THRESHOLD != 0
+ #define SQR_NEXTALG_THRESHOLD SQR_TOOM8H_THRESHOLD
+ #else
+ #define SQR_NEXTALG_THRESHOLD SQR_FFT_THRESHOLD
+ #endif
  #define MAYBE_sqr_basecase\
(SQR_TOOM4_THRESHOLD  4 * SQR_TOOM2_THRESHOLD)
***
*** 53,57 
(SQR_TOOM4_THRESHOLD  4 * SQR_TOOM3_THRESHOLD)
  #define MAYBE_sqr_toom4   
\
!   (SQR_FFT_THRESHOLD = 4 * SQR_TOOM4_THRESHOLD)
  #endif
  
--- 60,64 
(SQR_TOOM4_THRESHOLD  4 * SQR_TOOM3_THRESHOLD)
  #define MAYBE_sqr_toom4   
\
!   (SQR_NEXTALG_THRESHOLD = 4 * SQR_TOOM4_THRESHOLD)
  #endif
  


-- 
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel


Extending the mpn interface

2013-01-27 Thread Niels Möller
I'm currently working on some cryptograpic elliptic curve code, using
gmp for underlying bignum numbers (of moderate size, largest operation
is multiply with 17 limb inputs).

I think I want to keep using mpz_t for user-visible functions, but I'm
leaning towards using the mpn interface internally, for two main
reasons:

* I want to get control over the timing of operations, to avoid timing
  side-channels

* I'd like to write optimized code for computing mod p for certain fixed
  p with nice structure, and those functions have to work on
  mpn-representation.

But I miss some features in the public interface to do this
conveniently; both for mpn - mpz interoperation, and there are also
some mpn functions which would be very useful but which are not part of
the public interface.

For the first issue, it would be nice to have

1. A way to extract an mpn number (pointer and length) from an mpz_t,
   for read-only use. I.e., getting the _mp_d and _mp_size.

2. A way to extract an mpn number from an mpz_t, for writing. User has
   to provide maximum size, and then we also need a way to update the
   actual size when done. Basically, this is MPZ_REALLOC/MPZ_NEWALLOC
   and MPN_NORMALIZE.

3. A way to construct an mpz_t from an mpn, for read-only use. A bit
   like MPZ_TMP_INIT, but preferably with a macro expanding to an
   initializer, something like

 #define MPZ_FROM_MPN(p, n) {{ 0, (n), (p) }}
   
 mpz_t x = MPZ_FROM_MPN (xp, xn);

   Then this could be used also for compile-time constant mpz values.

Currently, the only really kosher way to do this is to go via mpz_export
and mpz_import, which isn't particularly convenient, and which also adds
a bit more overhead than I'd like.

What do you think? I think such interfaces can be defined to be safe
across gmp releases. It's ok with function-call overhead for (1) and (2)
and if necessary even for (3), but it would be ideal if they can be
defined without any extra overhead, so that the same interfaces can be
used internally in GMP.

For the second issue, here are some functions which I'm missing in the
public interface:

* mpn_mul_basecase (possibly under some other name, like mpn_mul_small
  or mpn_mul_sec). The most important thing is to have a side-channel
  silent multiply, but it would be nice to also avoid overhead which is
  needed only for largish numbers.

* mpn_addlsh_n, with fallback to mpn_addmul_1, if there's no special
  loop for this. Possibly also for other variants of shift-and-add.

* mpn_addcnd_n and mpn_subcnd_n (BTW, I wonder if the current C
  implementation is faster or slower than calling assembly
  mpn_addmul_1?). Possibly also mpn_addcnd_nc and mpn_subcnd_nc.

* mpn_powm_sec.

* mpn_add_nc, mpn_sub_nc (would it be difficult to ensure these are
  defined on all platforms)?

I think these functions are sufficiently well-understood that we can
design and commit to a public interface.

And a couple of functions I would need are missing:

* mpn_copycnd, like

void
mpn_copycnd (int cnd, mp_limb_t *rp, const mp_limb_t *ap, mp_size_t n)
{
  mp_limb_t mask, keep;
  mp_size_t i;

  mask = -(mp_limb_t) (cnd !=0);
  keep = ~mask;

  for (i = 0; i  n; i++)
rp[i] = (rp[i]  keep) + (ap[i]  mask);
}

  (related to mpn_tabselect).

* mpn_add_1_sec and mpn_sub_1_sec. O(n) operations, always propagating
  carry all the way.

I you think some or all of these extensions of the mpn interface make
sense, I may be able to spend some time on this.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel