On Tuesday 10 January 2012 19:06:57 Bill Hart wrote: > In my code I don't alias operands with sumdiff. However I do require > it to deal with the case n = 0. > > So do you know what the difference in timings between sumdiff and an > add + sub is on K10 when everything is in cache? > > Bill. > > On 10 January 2012 18:52, Jason <ja...@njkfrudils.plus.com> wrote: > > On Tuesday 10 January 2012 14:23:35 Bill Hart wrote: > >> I just tried removing mpn_sumdiff_n references from my code, and this > >> slowed it down substantially. So this function is really important for > >> the speed of the FFT. > > > > Given that the speedup of sumdiff is pretty small I would guess that the > > measured speedup is because of better use > > of the L1 cache. Why dont we just document sumdiff in the manual is export > > it properly. The only question is whether > > to allow both destinations to alias both sources. The asm versions can > > handle this case no problem and the existing C implementation can > > althouth it has to allocate space to do so. The new simpler `C > > implementation can't handle this case. As the measured speedup is from the > > better use > > of the L1 cache this is a pain. However if the define > > HAVE_NATIVE_mpn_sumdiff_t is set then we know we can , so like the case > > where sumdiff was > > "not present" on some arch we can just keep the present code(the present > > new fft code) and just use the HAVE_NATIVE_mpn_sumdiff where we have this > > special > > case , otherwise we must use a temp var and sumdiff or even separate add > > and sub (as that how it's writen now) > > > >>
If the L1 cache saving is significant then many other functions could also be adapted for a real speed up but only when >L1 cache size > >> Unfortunately it is not exported by MPIR and even though it is defined > >> for all processors, it is mpn_sumdiff_n in some libraries and > >> _gmpn_sumdiff_n in others and __gmpn_sumdiff_n in others. So this is a > >> total pain in the neck. I'm not sure what the best solution is. > >> > >> Bill. > >> > >> On 9 January 2012 18:01, Bill Hart <goodwillh...@googlemail.com> wrote: > >> > I wouldn't worry about it. It is possible I overwrote my timings file > >> > and that the times are not affected after all. > >> > > >> > On 9 January 2012 17:56, Jason <ja...@njkfrudils.plus.com> wrote: > >> >> On Sunday 08 January 2012 11:30:05 Bill Hart wrote: > >> >>> I decided to try the FFT without addsub_n and it seems to actually go > >> >>> consistently about 3% faster, which is totally mysterious. So I have > >> >>> removed it from the two files it is defined in: > >> >> > >> >> > >> >> Thats very strange , I assume this is on a K10 , what kind of sizes are > >> >> we talking about > >> >> > >> >>> > >> >>> ifft_mfa_truncate_sqrt2.c > >> >>> ifft_truncate_sqrt2.c > >> >>> > >> >>> As sumdiff_n seems to be defined for all platforms as far back as the > >> >>> MPIR 2.1 series (even if it is not explicitly exported), I can just > >> >>> extern this in my flint code (and of course it won't be a problem in > >> >>> mpir). > >> >>> > >> >>> So the fft should build on all systems now. > >> >>> > >> >>> Bill. > >> >>> > >> >>> > >> > >> > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.