Ciao,

I pushed some new functions to compute the product (and square) modulo B^nn+1, for small values of the exponent nn. For large values the already available fft_mul should be used.

The new functions actually require nn=k*n, where k can be 3 or in {5, 7, 13, 17}.
Also k=11 can be supported, but it's currently disabled.

I added also tune/speed support, but the problem with the analysis is that the function is defined for some values only...

Here are some results:

@shell ~/gmp-repo]$ /var/tmp/bodrato/gmp/hg/build/tune/speed -Cr -s 75-85\,240-250\,725-735 mpn_mul_n mpn_mulmod_bknp1 overhead 7.15 cycles, precision 10000 units of 2.86e-10 secs, CPU freq 3500.09 MHz
            mpn_mul_n mpn_mulmod_bknp1
75           159.4800       #0.9715
76          #167.2895           n/a
77          #167.1558        1.0793
78           193.8462       #0.7311
79          #169.8734           n/a
80           182.1500       #0.8597
81           196.0617       #0.7721
82          #170.8780           n/a
83          #170.8916           n/a
84           169.2024       #0.9344
85           188.8235       #0.7898
240          258.7792       #0.7474
241         #254.4274           n/a
242          266.2355       #0.9051
243          260.2798       #0.7414
244         #256.3893           n/a
245          256.8449       #0.8595
246          255.7073       #0.7915
247          257.0324       #0.9904
248         #257.6532           n/a
249          257.2410       #0.7925
250          258.1480       #0.8535
725          378.2538       #0.8652
726          377.4793       #0.8362
727         #376.6836           n/a
728          376.2060       #0.9164
729          379.7462       #0.7682
730          379.1068       #0.8727
731          379.0109       #0.9805
732          377.9754       #0.8285
733         #379.6958           n/a
734         #377.6403           n/a
735          376.0204       #0.8094

As you can see, the new functions are particularly effective, when there are many factors "3" in the size...

Integrating this in the current mod_bnm1 is easy, but the effect would be to accelerate the operations only for some sizes. This may add "noise" to the thresholds system. Some more experiments are needed.

In the meanwhile, the base code will be tested by the great testing system of GMP :-)

Ĝis,
m
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to