[mpir-devel] Re: FFT progress

Bill Hart Tue, 27 Dec 2011 07:49:23 -0800

David clarified the timing in that last slide of his. Apparently it
was just a misconfiguration external to MPIR. He reports that the
correct time for MPIR is 224s, which is in line with what we might
expect from the figures I gave for AMD chips.


Bill.

On 26 December 2011 14:44, Bill Hart <goodwillh...@googlemail.com> wrote:
> A few days ago, David Harvey gave a really nice talk at Warwick about
> some new number theoretic transform code he has been working on. It
> uses some nice tricks to get *really* fast butterflies. Even a C
> implementation turned out to be quite fast, although he is also
> working on some assembly optimised versions. Two really nice features
> are very low memory usage and ability to easily parallelise it (which
> he has done).
>
> The slides are available here:
>
> http://wiki.sagemath.org/SageFlintDays/slides?action=AttachFile&do=view&target=fastntt.pdf
>
> On the final slide you will find some timings of his new code versus
> mpir-2.4.0 and gmp-5.0.2. On that machine (an Intel Xeon) it appears
> that GMP is way faster than MPIR (and his code is slightly faster
> again, and much faster when parallelised).
>
> Anyhow, I decided to do all my timings for gmp-5.0.2 to see how it
> compares with mpir-2.4.0 and my new fft on my machine (a K10-2). I
> don't have a copy of David's code, so I can't compare with that at
> present.
>
> bits                iters   mpir    new_fft gmp
> 195840          1000  1.149s 1.105s 0.997s
> 261120          1000  1.483s 1.415s 1.396s
> 391296          100    0.261s 0.248s 0.282s
> 521728          100    0.344s 0.315s 0.411s
> 782592          100    0.577s 0.539s 0.628s
> 1043456        100    0.706s 0.688s 0.848s
> 1569024        100    1.229s 1.153s 1.317s
> 2092032        100    1.543s 1.462s 2.765s
> 3127296        10     0.283s 0.286s 0.408s
> 4169728        10     0.357s 0.350s 0.543s
> 6273024        10     0.621s 0.615s 0.843s
> 8364032        10     0.831s 0.799s 1.156s
> 12539904       10    1.441s 1.471s 1.798s
> 16719872       1      0.230s 0.209s 0.288s
> 25122816       1      0.379s 0.357s 0.434s
> 33497088       1      0.524s 0.462s 0.646s
> 50245632       1      0.833s 0.782s 1.035s
> 66994176       1      1.596s 0.967s 1.358s
> 100577280     1      1.906s 1.704s 2.177s
> 134103040     1      2.784s 2.162s 2.984s
> 201129984     1      3.971s 3.524s 4.536s
> 268173312     1      5.146s 4.572s 5.781s
> 402456576     1      7.548s 7.041s 9.867s
> 536608768     1      9.841s 9.428s 12.71s
> 804913152     1     15.48s 14.18s 20.06s
> 1073217536   1     21.17s 19.33s 27.19s
> 1610219520   1     31.64s 30.83s 43.37s
> 2146959360   1     43.25s 39.17s 57.66s
> 3220340736   1     70.14s 61.39s 92.94s
> 4293787648   1     96.00s 80.15s 146.1s
> 6441566208   1    150.2s 136.9s 217.5s
> 8588754944   1    208.4s 183.3s 312.8s
> 12883132416 1    327.4s 290.2s 447.7s
> 17177509888 1    485.0s 382.2s 614.2s
>
> I've checked multiple ways that I am really compiling against the
> right GMP. It also appears that GMP is configuring correctly on my
> machine. So it looks very much like either MPIR misconfigured on
> David's machine, or the GMP FFT is much, much better with Intel
> machines or the MPIR FFT is way, way worse with Intel machines.
>
> I'll try to get some configuration details from David and eventually
> I'll do some timings on an Intel machine to see how everything
> compares.
>
> Last night I tried combining pointwise multiplications with the
> inverse transform to improve memory locality. However, this failed
> because calling mpir's Nussbaumer convolution rapidly in succession
> results in an "Unable to allocate memory" error. I have no idea what
> causes this (it is certainly not out of memory). The code was
> exhibiting some very odd behaviour which I was unable to account for.
> My best guess is that it is to do with too many allocations on the
> stack or something. I am doubtlessly calling the function in a way it
> was not intended to be called. Anyhow, this just underscores the need
> to write a fast Nussbaumer convolution, which I currently do not have
> time to do (I don't need it for my application).
>
> Bill.

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

[mpir-devel] Re: FFT progress

Reply via email to