sec_invert performance

2014-02-04 Thread Niels Möller
I'm confused by the pretty bad performance of sec_invert. Try e.g.,

  ./tune/speed -s 1-200 -P invert mpn_gcdext mpn_sec_invert mpz_powm_sec

Say we have inputs of n bits. Then my understadning of these algorithms
is:

* mpn_gcdext, for sizes of interest, is Lehmer's algorithm, O(n^2), with
  a reasonably small constant since it reduces roughly one limb per
  iteration.

* mpn_sec_powm does n squarings + redc (and then some n/window_size
  multiplies and table lookups). And the squarings and redc are O(n^2),
  so total running time should be O(n^3).

* mpn_sec_invert has a loop of 2n iterations. Each iterations makes 9
  calls to various O(n) functions, like mpn_cnd_add_n, mpn_rshift, etc.
  So it ought to be O(n^2), and a constant factor slower than
  mpn_gcdext.

But when I run speed, *both* ratios mpn_invert / mpn_gcdext and
mpz_sec_powm / mpn_gcdext keep growing with n. So mpn_invert is not a
constant factor slower than mpn_gcdext, as I would have expected, and
it's significantly slower than mpn_sec_powm for largish sizes like 100
limbs. So it looks like it's O(n^3), not O(n^2).

Any idea what's going on? 

And then, there's also a sharp step at n = 113 or 114 (maybe that's an
artifact of the testmachine, but it's reproducible here).

$ ./speed -s 100-120  -c -r mpn_gcdext mpn_sec_invert mpz_powm_sec 
overhead 5.42 cycles, precision 1 units of 6.25e-10 secs, CPU freq 1600.00 
MHz
   mpn_gcdext mpn_sec_invert  mpz_powm_sec
100#286753.00   97.9985   68.7822
101#290981.00   96.3812   69.3512
102#297710.00   94.5395   68.9482
103#301772.00   94.9747   69.6444
104#308297.00   94.8706   69.0619
105#310131.00   96.1709   70.7960
106#312842.00   96.6907   70.8039
107#321814.00   96.2753   70.4176
108#313583.00  100.2901   73.2001
109#321223.00  100.3469   72.9247
110#328283.00   99.1086   72.5408
111#333812.00   99.4312   72.9408
112#335619.00  100.7185   73.4388
113#343734.00   46.7804   73.1944
114#352574.00   46.8327   72.9897
115#356299.00   46.9179   73.3826
116#366523.00   46.0239   72.2455
117#371422.00   46.5541   72.7283
118#367404.00   50.3468   74.7026
119#374872.00   48.6011   74.7945
120#376131.00   48.3491   75.4658

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: sec_invert performance

2014-02-04 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes:

  Any idea what's going on? 
  
I quick guess is that the exponent is fixed for powm, not a function of
the input size.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: sec_invert performance

2014-02-04 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes:

 I quick guess is that the exponent is fixed for powm, not a function of
 the input size.

Ah, its hardcoded to always use 6 limbs. If I change it to use the same
size as b and m, results are a bit different:

$ ./speed -s 1-64 -c -r mpn_gcdext mpn_sec_invert mpz_powm_sec
overhead 5.43 cycles, precision 1 units of 6.25e-10 secs, CPU freq 1600.00 
MHz
   mpn_gcdext mpn_sec_invert  mpz_powm_sec
1#1393.539.25632.8281
2#3700.838.45302.8537
3#5628.829.22964.7198
4#7387.089.87056.5449
5#9270.67   12.22719.4494
6   #11102.89   13.2985   11.9077
7   #12923.54   14.2668   14.8305
8   #14985.50   14.9848   17.2423
9   #16826.96   17.1405   21.4808
10  #18998.00   17.9614   24.7252
11  #20390.00   19.6006   29.7402
12  #23244.21   19.9373   32.0728
13  #24235.58   22.4859   39.6380
14  #26027.70   23.5316   44.4109
15  #29157.89   23.7497   47.7260
16  #30368.62   26.6497   53.3113
17  #32534.43   28.4042   60.1591
18  #33580.67   29.1482   69.0387
19  #34424.60   31.2023   78.6099
20  #36470.00   32.4313   84.8543
21  #36405.25   35.8625   99.2162
22  #41335.25   33.8338   97.9815
23  #39861.67   38.2781  116.4425
24  #41421.67   39.4835  124.2736
25  #44087.67   41.0079  138.0154
26  #48386.33   39.9403  139.1569
27  #40868.00   50.5410  184.1573

So if you know that the modulo is prime (or has some other type of known
factorization), using powm actually beats sec_invert for sizes up to 6
limbs.

I still puzzled why the ratio between sec_invert and gcdext doesn't
approach some constant (I'd expect the ratio to start growing again for
sizes above the threshold for subquadratic gcdext).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: sec_invert performance

2014-02-04 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes:

 All linear-time functions really take Bn+C time, and quadratic-time
 functions take An²+Bn+C, of course.  If the sec_invert function has
 small B and/or C while gcdext has larger B and/or C, then the observed
 pattern seem even more reasonable.  Do you agree?

Not sure. I had expected the quadratic terms to dominate (and hence the
ratio almost converging to the ratio between the n^2 coefficients), well
before we cross the GCDEXT_DC_THRESHOLD which is on the order of 500
limbs.

But I haven't had any deep thoughts about it. (I guess one could also
try to estimate all of A, B and C from the measurements. Putting such
numerics into speed is something I'd like to do sometime).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel