I have cleaned up my FFT massively and it now takes just over 2000 lines of code including test code. You can view the code here:
http://selmer.warwick.ac.uk/gitweb/flint2.git?a=tree;f=fft;h=1fcb640bdbf46137b01611a7ff7e1cc90fd2e4c0;hb=HEAD Just a few minor test functions to go, some tuning code and a nice integer mul wrapper and it will be fully usable. I'm thinking of just doing a Nussbaumer convolution now rather than leaving it until later. Bill. On 27 December 2011 15:47, Bill Hart <goodwillh...@googlemail.com> wrote: > David clarified the timing in that last slide of his. Apparently it > was just a misconfiguration external to MPIR. He reports that the > correct time for MPIR is 224s, which is in line with what we might > expect from the figures I gave for AMD chips. > > Bill. > > On 26 December 2011 14:44, Bill Hart <goodwillh...@googlemail.com> wrote: >> A few days ago, David Harvey gave a really nice talk at Warwick about >> some new number theoretic transform code he has been working on. It >> uses some nice tricks to get *really* fast butterflies. Even a C >> implementation turned out to be quite fast, although he is also >> working on some assembly optimised versions. Two really nice features >> are very low memory usage and ability to easily parallelise it (which >> he has done). >> >> The slides are available here: >> >> http://wiki.sagemath.org/SageFlintDays/slides?action=AttachFile&do=view&target=fastntt.pdf >> >> On the final slide you will find some timings of his new code versus >> mpir-2.4.0 and gmp-5.0.2. On that machine (an Intel Xeon) it appears >> that GMP is way faster than MPIR (and his code is slightly faster >> again, and much faster when parallelised). >> >> Anyhow, I decided to do all my timings for gmp-5.0.2 to see how it >> compares with mpir-2.4.0 and my new fft on my machine (a K10-2). I >> don't have a copy of David's code, so I can't compare with that at >> present. >> >> bits iters mpir new_fft gmp >> 195840 1000 1.149s 1.105s 0.997s >> 261120 1000 1.483s 1.415s 1.396s >> 391296 100 0.261s 0.248s 0.282s >> 521728 100 0.344s 0.315s 0.411s >> 782592 100 0.577s 0.539s 0.628s >> 1043456 100 0.706s 0.688s 0.848s >> 1569024 100 1.229s 1.153s 1.317s >> 2092032 100 1.543s 1.462s 2.765s >> 3127296 10 0.283s 0.286s 0.408s >> 4169728 10 0.357s 0.350s 0.543s >> 6273024 10 0.621s 0.615s 0.843s >> 8364032 10 0.831s 0.799s 1.156s >> 12539904 10 1.441s 1.471s 1.798s >> 16719872 1 0.230s 0.209s 0.288s >> 25122816 1 0.379s 0.357s 0.434s >> 33497088 1 0.524s 0.462s 0.646s >> 50245632 1 0.833s 0.782s 1.035s >> 66994176 1 1.596s 0.967s 1.358s >> 100577280 1 1.906s 1.704s 2.177s >> 134103040 1 2.784s 2.162s 2.984s >> 201129984 1 3.971s 3.524s 4.536s >> 268173312 1 5.146s 4.572s 5.781s >> 402456576 1 7.548s 7.041s 9.867s >> 536608768 1 9.841s 9.428s 12.71s >> 804913152 1 15.48s 14.18s 20.06s >> 1073217536 1 21.17s 19.33s 27.19s >> 1610219520 1 31.64s 30.83s 43.37s >> 2146959360 1 43.25s 39.17s 57.66s >> 3220340736 1 70.14s 61.39s 92.94s >> 4293787648 1 96.00s 80.15s 146.1s >> 6441566208 1 150.2s 136.9s 217.5s >> 8588754944 1 208.4s 183.3s 312.8s >> 12883132416 1 327.4s 290.2s 447.7s >> 17177509888 1 485.0s 382.2s 614.2s >> >> I've checked multiple ways that I am really compiling against the >> right GMP. It also appears that GMP is configuring correctly on my >> machine. So it looks very much like either MPIR misconfigured on >> David's machine, or the GMP FFT is much, much better with Intel >> machines or the MPIR FFT is way, way worse with Intel machines. >> >> I'll try to get some configuration details from David and eventually >> I'll do some timings on an Intel machine to see how everything >> compares. >> >> Last night I tried combining pointwise multiplications with the >> inverse transform to improve memory locality. However, this failed >> because calling mpir's Nussbaumer convolution rapidly in succession >> results in an "Unable to allocate memory" error. I have no idea what >> causes this (it is certainly not out of memory). The code was >> exhibiting some very odd behaviour which I was unable to account for. >> My best guess is that it is to do with too many allocations on the >> stack or something. I am doubtlessly calling the function in a way it >> was not intended to be called. Anyhow, this just underscores the need >> to write a fast Nussbaumer convolution, which I currently do not have >> time to do (I don't need it for my application). >> >> Bill. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.