Re: GMP used during 3 and a half years to solve MIT's LCS35

2019-12-16 Thread Marco Bodrato

Ciao,

Il 2019-12-16 16:30 t...@gmplib.org ha scritto:

paul zimmermann  writes:

  thank you for the benchmark, which shows a clear gain with the 
popcount

  method when the exponent is sparse, and no regression when it is not.

I believe that the amount of analysis of the exponent we can afford is 
a

function of the expected computation time.


I agree. For size n=1 the popcount method is often slower.


A popcount adds linear time in the exponent size.  If the base and
modular argument are small enough, that will add noticeable overhead.

We might want to do a more sophisticated analysis of the exponent when
the base and modular are are large enough.


I think we can look at the exponent only. The window size is based on 
it.


Ĝis,
m
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP used during 3 and a half years to solve MIT's LCS35

2019-12-16 Thread Torbjörn Granlund
paul zimmermann  writes:

  thank you for the benchmark, which shows a clear gain with the popcount
  method when the exponent is sparse, and no regression when it is not.

I believe that the amount of analysis of the exponent we can afford is a
function of the expected computation time.

A popcount adds linear time in the exponent size.  If the base and
modular argument are small enough, that will add noticeable overhead.

We might want to do a more sophisticated analysis of the exponent when
the base and modular are are large enough.

-- 
Torbjörn
Please encrypt, key id 0xC8601622
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP used during 3 and a half years to solve MIT's LCS35

2019-12-16 Thread paul zimmermann
   Dear Marco,

thank you for the benchmark, which shows a clear gain with the popcount
method when the exponent is sparse, and no regression when it is not.

Paul
___
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel


Re: GMP used during 3 and a half years to solve MIT's LCS35

2019-12-14 Thread Marco Bodrato

Ciao,

Il 2019-05-02 10:09 paul zimmermann ha scritto (on gmp-discuss):

That's why a few weeks ago on this same list I suggested:
"What about even sparser exponents? Should we use 2*popcount(n) 
instead of
size_in_bits(n) to estimate the best window size? They should be 
almost

equivalent on average."

[ https://gmplib.org/list-archives/gmp-discuss/2019-April/006310.html 
]



I support the (clever) idea of using 2*popcount(exponent).
Could you benchmark this against the current version, on
dense and sparse exponents?


I tried, with the attached code (replace tune/speed-ext.c, then cd 
tune;make speed-ext). It contains a copy of the current 
mpn/generic/powm.c code (and mpz/powm.c) with an additional flag and a 
branch:


  if (flag != 0)
windowsize = win_size (mpn_popcount (ep, en) * 2 - 1);
  else
windowsize = win_size (ebi);

This way we can compare both ways to estimate the window size.
Unfortunately, there are large fluctuations while measuring this 
function, for each size I measure three _bs and three _pc (bitsize vs. 
popcount).


I used the command-line

for i in 0 1 2 4 16 256; do tune/speed-ext -rs1-260 -f2 mpz_powm_bs.$i 
mpz_powm_pc.$i mpz_powm_bs.$i mpz_powm_pc.$i mpz_powm_bs.$i 
mpz_powm_pc.$i; done


Because of the "-r", the first column is the time used, the other 
columns contain the ratio.


With r-flag=0 (random exponent) the speed are comparable. Half the times 
the winner is _bs, half the times the winner is _pc:


overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz

 _bs.0   _pc.0 _bs.0 _pc.0 _bs.0 _pc.0
1 0.008490.99730.99540.9968   #0.99290.9951
2 0.025831.11070.99360.99590.9868   #0.9862
4 0.134730.99510.9952   #0.99470.99610.9947
8 0.708500.99830.99870.99780.9981   #0.9977
160.0004817970.96240.9554   #0.91610.96020.9537
320.0033078780.99790.99710.99460.9998   #0.9780
640.0215020001.00130.99790.9972   #0.99600.9973
128  #0.1267490001.00361.01041.01311.00591.0032
256   0.7389980001.00010.99610.9995   #0.99531.0008

With r-flag=1 (exponent is set to binary 111...111), popcount may 
suggest a larger window-size, but again no clear winner:


overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz

 _bs.1   _pc.1 _bs.1 _pc.1 _bs.1 _pc.1
1 0.008780.99671.0014   #0.99211.00160.9978
2 0.029701.00530.98460.9871   #0.98290.9949
4 0.134000.99950.99770.9973   #0.99730.9990
8 0.717760.99480.99890.99500.9998   #0.9947
160.0004750790.97010.9993   #0.96050.96270.9691
320.003303542   #0.99681.00121.00831.01231.0168
640.0218030001.00931.01170.99711.0008   #0.9928
128  #0.1282250001.00471.00421.01291.00181.0083
256   0.7462840001.0051   #0.99771.00060.99881.0058

With sparser exponents (with r-flag=2, half of the exponent is random 
data, half is 0; with r-flag=4, one fourth of the exponent is random 
data; and so on...), popcount suggests a smaller window-size, and seems 
effective:


overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz

 _bs.4   _pc.4 _bs.4 _pc.4 _bs.4 _pc.4
1 0.007650.98450.97530.9818   #0.97360.9842
2 0.022850.98391.00160.98141.0022   #0.9810
4 0.117730.99411.0026   #0.98201.00010.9988
8 0.61283   #0.98840.99750.98891.94931.9487
160.0003945271.03431.0565   #0.99180.99961.0378
320.0029869440.97580.9840   #0.96630.97940.9669
64   #0.0190950001.00251.00981.00961.01941.0104
128   0.1162890000.99580.99830.99410.9994   #0.9933
256   0.6812130000.99120.9948   #0.99021.00150.9905
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz

 _bs.16  _pc.16_bs.16_pc.16_bs.16_pc.16
1 0.007290.98750.9652   #0.95150.95360.9849
2 0.02228   #0.94500.99530.94750.99480.9477
4 0.11343   #0.94770.99520.94800.99430.9518
8 0.588050.97840.99840.97600.9976   #0.9759
160.000381547   #0.97841.01860.99121.00070.9927
320.002858413   #0.96950.99480.97931.00290.9706
640.0191230000.97940.9918   #0.97160.99630.9876
128   0.113480.98920.9974   #0.98781.00430.9907
256   0.6680130000.98450.99790.98480.9980   #0.9814
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU 
freq 3500.09 MHz