I've now added the negacyclic transforms to the flint repo and cleaned it all up ready for some tuning code to be written.
At the moment some hard coded tuning values are present in mulmod_2expp1.c. There are no tuning values for the main multiplication routine because there is no wrapper function mpn_mul_fft or whatever at this point (it's trivial to write one, it just gets the appropriate tuning values and calls fft_mul_mfa_sqrt2_truncate). A port to MPIR would involve the following: * replace FLINT_BITS with GMP_LIMB_BITS throughout, similarly with FLINT_MIN -> MIN * replace malloc with TMP_MARK, TMP_LIMBS_BALLOC throughout * replace free with TMP_FREE throughout * #include gmp-impl.h throughout * copy n_revbin from the ulong_extras directory (I hereby license it BSD as per the remainder of the FFT implementation) * add alternative code paths for mpn_sumdiff_n and mpn_addsub_n where these are used but not available on certain architectures Eventually when I add tuning code it will need rewriting for MPIR. It's trivial to tune this FFT. Obviously as two thirds of the time is spent in butterflies, any assembly optimisation of these will speed things up. I don't plan to do this myself. Bill. On 1 January 2012 02:28, Bill Hart <goodwillh...@googlemail.com> wrote: > I have finished implementing and debugging a new Nussbaumer > convolution. It isn't the absolutely fastest possible thing one can > do, but it is still slightly better than what is in mpir. > > The fourth column in the following table shows the times for the new > code. It is only used from a certain point on hence no changes to the > first half of the table: > > bits mpir new_fft gmp nussbaumer > 195840 1.149s 1.105s 0.997s > 261120 1.483s 1.415s 1.396s > 391296 0.261s 0.248s 0.282s > 521728 0.344s 0.315s 0.411s > 782592 0.577s 0.539s 0.628s > 1043456 0.706s 0.688s 0.848s > 1569024 1.229s 1.153s 1.317s > 2092032 1.543s 1.462s 2.765s > 3127296 0.283s 0.286s 0.408s > 4169728 0.357s 0.350s 0.543s > 6273024 0.621s 0.615s 0.843s > 8364032 0.831s 0.799s 1.156s > 12539904 1.441s 1.471s 1.798s > 16719872 0.230s 0.209s 0.288s > 25122816 0.379s 0.357s 0.434s > 33497088 0.524s 0.462s 0.646s > 50245632 0.833s 0.782s 1.035s > 66994176 1.596s 0.967s 1.358s > 100577280 1.906s 1.704s 2.177s 1.684s > 134103040 2.784s 2.162s 2.984s 2.154s > 201129984 3.971s 3.460s 4.536s 3.349s > 268173312 5.146s 4.419s 5.781s 4.350s > 402456576 7.548s 7.150s 9.867s 6.717s > 536608768 9.841s 9.264s 12.71s 8.804s > 804913152 15.48s 14.71s 20.06s 13.92s > 1073217536 21.17s 18.85s 27.19s 17.96s > 1610219520 31.64s 30.64s 43.37s 29.64s > 2146959360 43.25s 39.17s 57.66s 39.07s > 3220340736 70.14s 61.39s 92.94s 61.17s > 4293787648 96.00s 80.15s 146.1s 79.13s > 6441566208 150.2s 136.9s 217.5s 132.9s > 8588754944 208.4s 183.3s 312.8s 177.2s > 12883132416 327.4s 290.2s 447.7s 281.8s > 17177509888 485.0s 382.2s 614.2s 372.6s > > The final step is to write a few missing test functions and to write > tuning code. (Note the Nussbaumer code is in the mpir-fft repo not the > flint repo at this stage. I will fix this tomorrow.) > > Happy New Year! > > Bill. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.