Daniel Lichtblau <d...@wolfram.com> writes: > My question(s): Are these NTT intermediate results cached?
No. I have a tentative interface for this in my small-prime fft code which haven't yet been integrated with gmp. I think Paul Zimmermann may also have some patches along those lines for the Schönhage-Strassen fft which is used in GMP now. > If not, is there a way to "tell" GMP to keep them around for > subsequent reuse? I suspect such caching might be a generally useful > thing to have if it is not already present. Interfaces for operations on invariant numbers have been discussed for a long time, but no public interfaces have been implemented (and few internal). Caching transforms may be useful also for smaller multiplications (I think I saw a speedup of a few % already with Karatsuba, when I played with that). And for division for all sizes. > For fast GCD it could amount to a notable speed gain, I think. (I > believe Niels remarked specifically on this but do not have the > reference handy. Sorry.) In GCD, there are certain large unbalanced multiplication. Splitting the large factor and reusing the transform of the small factor is one possible approach to optimization. Another is transforming the 2x2 matrix elements once (since they're used twice for a matrix x vector multiplication), that's done in the small-prime fft code. And yet another is using wraparound arithmetic. (And there are surely other tricks as well). Hmm, looking at the GCD code now, it makes some limited use of wraparound arithmetic. Only for the matrix x vector multiplication for the first of the two recursive reductions in hgcd. The gmp function to look for is mpn_mulmod_bnm1. I also dug up some old slides. Below are some estimates extracted from a presentation I made at KTH, May 15, 2008. I don't remember much of the details, but for the largest savings, you'd want to not only cache transforms, but also do some linear operations in the transform domain. E.g,, to multiply two 2x2 matrices, first FFT-transform the 4 elements of each matrix, *then* expand them linearly to two 7-element vectors for Strassen multiplication. Do the 7 pointwise multiplications, linearly transform them back to four elements, and finally, four inverse FFTs, one for each element of the result matrix. Regards, /Niels \section{Further work} \titleslide{Further work} \begin{frame} \frametitle{Matrix multiplication} \begin{equation*} M_1 \cdot M_2 \quad \text{$2\times2$ matrices} \end{equation*} \bigskip Assume \FFT and sizes such that transforms and pointwise multiplication take equal time. \bigskip \begin{tabular}{l|rrrr} & \FFT & \IFFT & Pointwise & Saving \\ \hline Naive & 16 & 8 & 8 & 0\% \\ Schönhage-Strassen & 14 & 7 & 7 & 12\% \\ Invariance & 8 & 4 & 8 & 37\% \\ S.-S. + invariance & 8 & 4 & 7 & 40\% \\ \end{tabular} \end{frame} \begin{frame} \frametitle{Matrix-vector multiplication} \begin{itemize} \item If $\alpha, \beta$ are returned: $M$ of size $n/4$, $A', B'$ of size $n/2$. \begin{equation*} M^{-1} \cdot \M{A \\ B} = 2^p \M{\alpha \\ \beta} + M^{-1} \cdot \M{A' \\ B'} \end{equation*} \begin{tabular}{l|rrl} & \#Mults. & Prod. size &\\ \hline Naive & 4 & $3n/4$ & Wins in \FFT range \\ Block & 8 & $n/2$ & Can use invariance \\ S.-S. & 7 & $n/2$ & Wins in Karatsuba range \\ \end{tabular} \pause \item If only matrix is returned: $M$ of size $n/4$, $A, B$ of size $n$. \begin{equation*} \M{\alpha \\ \beta} = M^{-1} \cdot \M{A \\ B} \end{equation*} $\alpha, \beta$ are of size $3n/4$ (cancellation!). Compute $\bmod (2^k \pm 1)$, with transform size $\approx 3n/4$. \pause \item \alert{Same transform size}, $3n/4$, no matter if reduced numbers are available or not! \end{itemize} \end{frame} -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel