Ciao,

Il 2022-02-15 11:48 Marco Bodrato ha scritto:
I pushed some new functions to compute the product (and square) modulo
B^nn+1, for small values of the exponent nn. For large values the
already available fft_mul should be used.

A possible use is replacing the last-level point-wise multiplication of mul_fft.
I tried to do that.

The new functions actually require nn=k*n, where k can be 3 or in {5,
7, 13, 17}.

This means that it can be used only when mul_fft falls into such a size.
The results is that for some sizes there is an effect,
for other sizes there is none.
Not optimal, but an improvement anyway...

I attach a graph (fft_with_bknp1.png) obtained plotting the output of
tune/speed -s 256-131072 -t128 mpn_mul_fft
before (_old) and after (_new) the patch.


Unfortunately the current tuning criteria does not interact well with the added code. If the library is re-tuned, then there are sizes where a worst combination is chosen and the effect is a slowdown. I attach another graph (fft_retuned.png) with the timings obtained as above, the "_retuned" line was measured after re-tuning (make -C tune tune).

I don't know how the generation of FFT_TABLE works, should we change it somehow?

Ĝis,
m
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to