On Monday 02 January 2012 07:30:03 Bill Hart wrote: > I've now added the negacyclic transforms to the flint repo and cleaned > it all up ready for some tuning code to be written. > > At the moment some hard coded tuning values are present in > mulmod_2expp1.c. There are no tuning values for the main > multiplication routine because there is no wrapper function > mpn_mul_fft or whatever at this point (it's trivial to write one, it > just gets the appropriate tuning values and calls > fft_mul_mfa_sqrt2_truncate). > > A port to MPIR would involve the following: > > * replace FLINT_BITS with GMP_LIMB_BITS throughout, similarly with > FLINT_MIN -> MIN > * replace malloc with TMP_MARK, TMP_LIMBS_BALLOC throughout > * replace free with TMP_FREE throughout > * #include gmp-impl.h throughout > * copy n_revbin from the ulong_extras directory (I hereby license it > BSD as per the remainder of the FFT implementation) > * add alternative code paths for mpn_sumdiff_n and mpn_addsub_n where > these are used but not available on certain architectures > > Eventually when I add tuning code it will need rewriting for MPIR. > It's trivial to tune this FFT. > > Obviously as two thirds of the time is spent in butterflies, any > assembly optimisation of these will speed things up. I don't plan to > do this myself. >
I can give it go , starting this week and also the fast naive acyclic convolution for 1 and 2 limb coefficients. Jason > Bill. > > On 1 January 2012 02:28, Bill Hart <goodwillh...@googlemail.com> wrote: > > I have finished implementing and debugging a new Nussbaumer > > convolution. It isn't the absolutely fastest possible thing one can > > do, but it is still slightly better than what is in mpir. > > > > The fourth column in the following table shows the times for the new > > code. It is only used from a certain point on hence no changes to the > > first half of the table: > > > > bits mpir new_fft gmp nussbaumer > > 195840 1.149s 1.105s 0.997s > > 261120 1.483s 1.415s 1.396s > > 391296 0.261s 0.248s 0.282s > > 521728 0.344s 0.315s 0.411s > > 782592 0.577s 0.539s 0.628s > > 1043456 0.706s 0.688s 0.848s > > 1569024 1.229s 1.153s 1.317s > > 2092032 1.543s 1.462s 2.765s > > 3127296 0.283s 0.286s 0.408s > > 4169728 0.357s 0.350s 0.543s > > 6273024 0.621s 0.615s 0.843s > > 8364032 0.831s 0.799s 1.156s > > 12539904 1.441s 1.471s 1.798s > > 16719872 0.230s 0.209s 0.288s > > 25122816 0.379s 0.357s 0.434s > > 33497088 0.524s 0.462s 0.646s > > 50245632 0.833s 0.782s 1.035s > > 66994176 1.596s 0.967s 1.358s > > 100577280 1.906s 1.704s 2.177s 1.684s > > 134103040 2.784s 2.162s 2.984s 2.154s > > 201129984 3.971s 3.460s 4.536s 3.349s > > 268173312 5.146s 4.419s 5.781s 4.350s > > 402456576 7.548s 7.150s 9.867s 6.717s > > 536608768 9.841s 9.264s 12.71s 8.804s > > 804913152 15.48s 14.71s 20.06s 13.92s > > 1073217536 21.17s 18.85s 27.19s 17.96s > > 1610219520 31.64s 30.64s 43.37s 29.64s > > 2146959360 43.25s 39.17s 57.66s 39.07s > > 3220340736 70.14s 61.39s 92.94s 61.17s > > 4293787648 96.00s 80.15s 146.1s 79.13s > > 6441566208 150.2s 136.9s 217.5s 132.9s > > 8588754944 208.4s 183.3s 312.8s 177.2s > > 12883132416 327.4s 290.2s 447.7s 281.8s > > 17177509888 485.0s 382.2s 614.2s 372.6s > > > > The final step is to write a few missing test functions and to write > > tuning code. (Note the Nussbaumer code is in the mpir-fft repo not the > > flint repo at this stage. I will fix this tomorrow.) > > > > Happy New Year! > > > > Bill. > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.