Ciao, Il 2022-02-15 11:48 Marco Bodrato ha scritto:
I pushed some new functions to compute the product (and square) modulo B^nn+1, for small values of the exponent nn. For large values the already available fft_mul should be used.
A possible use is replacing the last-level point-wise multiplication of mul_fft.
I tried to do that.
The new functions actually require nn=k*n, where k can be 3 or in {5, 7, 13, 17}.
This means that it can be used only when mul_fft falls into such a size. The results is that for some sizes there is an effect, for other sizes there is none. Not optimal, but an improvement anyway... I attach a graph (fft_with_bknp1.png) obtained plotting the output of tune/speed -s 256-131072 -t128 mpn_mul_fft before (_old) and after (_new) the patch.Unfortunately the current tuning criteria does not interact well with the added code. If the library is re-tuned, then there are sizes where a worst combination is chosen and the effect is a slowdown. I attach another graph (fft_retuned.png) with the timings obtained as above, the "_retuned" line was measured after re-tuning (make -C tune tune).
I don't know how the generation of FFT_TABLE works, should we change it somehow?
Ĝis, m
_______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel