I have cleaned up my FFT massively and it now takes just over 2000
lines of code including test code. You can view the code here:

http://selmer.warwick.ac.uk/gitweb/flint2.git?a=tree;f=fft;h=1fcb640bdbf46137b01611a7ff7e1cc90fd2e4c0;hb=HEAD

Just a few minor test functions to go, some tuning code and a nice
integer mul wrapper and it will be fully usable. I'm thinking of just
doing a Nussbaumer convolution now rather than leaving it until later.

Bill.

On 27 December 2011 15:47, Bill Hart <goodwillh...@googlemail.com> wrote:
> David clarified the timing in that last slide of his. Apparently it
> was just a misconfiguration external to MPIR. He reports that the
> correct time for MPIR is 224s, which is in line with what we might
> expect from the figures I gave for AMD chips.
>
> Bill.
>
> On 26 December 2011 14:44, Bill Hart <goodwillh...@googlemail.com> wrote:
>> A few days ago, David Harvey gave a really nice talk at Warwick about
>> some new number theoretic transform code he has been working on. It
>> uses some nice tricks to get *really* fast butterflies. Even a C
>> implementation turned out to be quite fast, although he is also
>> working on some assembly optimised versions. Two really nice features
>> are very low memory usage and ability to easily parallelise it (which
>> he has done).
>>
>> The slides are available here:
>>
>> http://wiki.sagemath.org/SageFlintDays/slides?action=AttachFile&do=view&target=fastntt.pdf
>>
>> On the final slide you will find some timings of his new code versus
>> mpir-2.4.0 and gmp-5.0.2. On that machine (an Intel Xeon) it appears
>> that GMP is way faster than MPIR (and his code is slightly faster
>> again, and much faster when parallelised).
>>
>> Anyhow, I decided to do all my timings for gmp-5.0.2 to see how it
>> compares with mpir-2.4.0 and my new fft on my machine (a K10-2). I
>> don't have a copy of David's code, so I can't compare with that at
>> present.
>>
>> bits                iters   mpir    new_fft gmp
>> 195840          1000  1.149s 1.105s 0.997s
>> 261120          1000  1.483s 1.415s 1.396s
>> 391296          100    0.261s 0.248s 0.282s
>> 521728          100    0.344s 0.315s 0.411s
>> 782592          100    0.577s 0.539s 0.628s
>> 1043456        100    0.706s 0.688s 0.848s
>> 1569024        100    1.229s 1.153s 1.317s
>> 2092032        100    1.543s 1.462s 2.765s
>> 3127296        10     0.283s 0.286s 0.408s
>> 4169728        10     0.357s 0.350s 0.543s
>> 6273024        10     0.621s 0.615s 0.843s
>> 8364032        10     0.831s 0.799s 1.156s
>> 12539904       10    1.441s 1.471s 1.798s
>> 16719872       1      0.230s 0.209s 0.288s
>> 25122816       1      0.379s 0.357s 0.434s
>> 33497088       1      0.524s 0.462s 0.646s
>> 50245632       1      0.833s 0.782s 1.035s
>> 66994176       1      1.596s 0.967s 1.358s
>> 100577280     1      1.906s 1.704s 2.177s
>> 134103040     1      2.784s 2.162s 2.984s
>> 201129984     1      3.971s 3.524s 4.536s
>> 268173312     1      5.146s 4.572s 5.781s
>> 402456576     1      7.548s 7.041s 9.867s
>> 536608768     1      9.841s 9.428s 12.71s
>> 804913152     1     15.48s 14.18s 20.06s
>> 1073217536   1     21.17s 19.33s 27.19s
>> 1610219520   1     31.64s 30.83s 43.37s
>> 2146959360   1     43.25s 39.17s 57.66s
>> 3220340736   1     70.14s 61.39s 92.94s
>> 4293787648   1     96.00s 80.15s 146.1s
>> 6441566208   1    150.2s 136.9s 217.5s
>> 8588754944   1    208.4s 183.3s 312.8s
>> 12883132416 1    327.4s 290.2s 447.7s
>> 17177509888 1    485.0s 382.2s 614.2s
>>
>> I've checked multiple ways that I am really compiling against the
>> right GMP. It also appears that GMP is configuring correctly on my
>> machine. So it looks very much like either MPIR misconfigured on
>> David's machine, or the GMP FFT is much, much better with Intel
>> machines or the MPIR FFT is way, way worse with Intel machines.
>>
>> I'll try to get some configuration details from David and eventually
>> I'll do some timings on an Intel machine to see how everything
>> compares.
>>
>> Last night I tried combining pointwise multiplications with the
>> inverse transform to improve memory locality. However, this failed
>> because calling mpir's Nussbaumer convolution rapidly in succession
>> results in an "Unable to allocate memory" error. I have no idea what
>> causes this (it is certainly not out of memory). The code was
>> exhibiting some very odd behaviour which I was unable to account for.
>> My best guess is that it is to do with too many allocations on the
>> stack or something. I am doubtlessly calling the function in a way it
>> was not intended to be called. Anyhow, this just underscores the need
>> to write a fast Nussbaumer convolution, which I currently do not have
>> time to do (I don't need it for my application).
>>
>> Bill.

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to