Ciao,
Il 2019-05-02 10:09 paul zimmermann ha scritto (on gmp-discuss):
That's why a few weeks ago on this same list I suggested:
"What about even sparser exponents? Should we use 2*popcount(n)
instead of
size_in_bits(n) to estimate the best window size? They should be
almost
equivalent on average."
[ https://gmplib.org/list-archives/gmp-discuss/2019-April/006310.html
]
I support the (clever) idea of using 2*popcount(exponent).
Could you benchmark this against the current version, on
dense and sparse exponents?
I tried, with the attached code (replace tune/speed-ext.c, then cd
tune;make speed-ext). It contains a copy of the current
mpn/generic/powm.c code (and mpz/powm.c) with an additional flag and a
branch:
if (flag != 0)
windowsize = win_size (mpn_popcount (ep, en) * 2 - 1);
else
windowsize = win_size (ebi);
This way we can compare both ways to estimate the window size.
Unfortunately, there are large fluctuations while measuring this
function, for each size I measure three _bs and three _pc (bitsize vs.
popcount).
I used the command-line
for i in 0 1 2 4 16 256; do tune/speed-ext -rs1-260 -f2 mpz_powm_bs.$i
mpz_powm_pc.$i mpz_powm_bs.$i mpz_powm_pc.$i mpz_powm_bs.$i
mpz_powm_pc.$i; done
Because of the "-r", the first column is the time used, the other
columns contain the ratio.
With r-flag=0 (random exponent) the speed are comparable. Half the times
the winner is _bs, half the times the winner is _pc:
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.0 _pc.0 _bs.0 _pc.0 _bs.0 _pc.0
1 0.008490.99730.99540.9968 #0.99290.9951
2 0.025831.11070.99360.99590.9868 #0.9862
4 0.134730.99510.9952 #0.99470.99610.9947
8 0.708500.99830.99870.99780.9981 #0.9977
160.0004817970.96240.9554 #0.91610.96020.9537
320.0033078780.99790.99710.99460.9998 #0.9780
640.0215020001.00130.99790.9972 #0.99600.9973
128 #0.1267490001.00361.01041.01311.00591.0032
256 0.7389980001.00010.99610.9995 #0.99531.0008
With r-flag=1 (exponent is set to binary 111...111), popcount may
suggest a larger window-size, but again no clear winner:
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.1 _pc.1 _bs.1 _pc.1 _bs.1 _pc.1
1 0.008780.99671.0014 #0.99211.00160.9978
2 0.029701.00530.98460.9871 #0.98290.9949
4 0.134000.99950.99770.9973 #0.99730.9990
8 0.717760.99480.99890.99500.9998 #0.9947
160.0004750790.97010.9993 #0.96050.96270.9691
320.003303542 #0.99681.00121.00831.01231.0168
640.0218030001.00931.01170.99711.0008 #0.9928
128 #0.1282250001.00471.00421.01291.00181.0083
256 0.7462840001.0051 #0.99771.00060.99881.0058
With sparser exponents (with r-flag=2, half of the exponent is random
data, half is 0; with r-flag=4, one fourth of the exponent is random
data; and so on...), popcount suggests a smaller window-size, and seems
effective:
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.4 _pc.4 _bs.4 _pc.4 _bs.4 _pc.4
1 0.007650.98450.97530.9818 #0.97360.9842
2 0.022850.98391.00160.98141.0022 #0.9810
4 0.117730.99411.0026 #0.98201.00010.9988
8 0.61283 #0.98840.99750.98891.94931.9487
160.0003945271.03431.0565 #0.99180.99961.0378
320.0029869440.97580.9840 #0.96630.97940.9669
64 #0.0190950001.00251.00981.00961.01941.0104
128 0.1162890000.99580.99830.99410.9994 #0.9933
256 0.6812130000.99120.9948 #0.99021.00150.9905
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU
freq 3500.09 MHz
_bs.16 _pc.16_bs.16_pc.16_bs.16_pc.16
1 0.007290.98750.9652 #0.95150.95360.9849
2 0.02228 #0.94500.99530.94750.99480.9477
4 0.11343 #0.94770.99520.94800.99430.9518
8 0.588050.97840.99840.97600.9976 #0.9759
160.000381547 #0.97841.01860.99121.00070.9927
320.002858413 #0.96950.99480.97931.00290.9706
640.0191230000.97940.9918 #0.97160.99630.9876
128 0.113480.98920.9974 #0.98781.00430.9907
256 0.6680130000.98450.99790.98480.9980 #0.9814
overhead 0.2 secs, precision 1 units of 2.86e-10 secs, CPU
freq 3500.09 MHz