In my code I don't alias operands with sumdiff. However I do require it to deal with the case n = 0.
So do you know what the difference in timings between sumdiff and an add + sub is on K10 when everything is in cache? Bill. On 10 January 2012 18:52, Jason <ja...@njkfrudils.plus.com> wrote: > On Tuesday 10 January 2012 14:23:35 Bill Hart wrote: >> I just tried removing mpn_sumdiff_n references from my code, and this >> slowed it down substantially. So this function is really important for >> the speed of the FFT. > > Given that the speedup of sumdiff is pretty small I would guess that the > measured speedup is because of better use > of the L1 cache. Why dont we just document sumdiff in the manual is export it > properly. The only question is whether > to allow both destinations to alias both sources. The asm versions can handle > this case no problem and the existing C implementation can > althouth it has to allocate space to do so. The new simpler `C implementation > can't handle this case. As the measured speedup is from the better use > of the L1 cache this is a pain. However if the define > HAVE_NATIVE_mpn_sumdiff_t is set then we know we can , so like the case where > sumdiff was > "not present" on some arch we can just keep the present code(the present new > fft code) and just use the HAVE_NATIVE_mpn_sumdiff where we have this special > case , otherwise we must use a temp var and sumdiff or even separate add and > sub (as that how it's writen now) > >> >> Unfortunately it is not exported by MPIR and even though it is defined >> for all processors, it is mpn_sumdiff_n in some libraries and >> _gmpn_sumdiff_n in others and __gmpn_sumdiff_n in others. So this is a >> total pain in the neck. I'm not sure what the best solution is. >> >> Bill. >> >> On 9 January 2012 18:01, Bill Hart <goodwillh...@googlemail.com> wrote: >> > I wouldn't worry about it. It is possible I overwrote my timings file >> > and that the times are not affected after all. >> > >> > On 9 January 2012 17:56, Jason <ja...@njkfrudils.plus.com> wrote: >> >> On Sunday 08 January 2012 11:30:05 Bill Hart wrote: >> >>> I decided to try the FFT without addsub_n and it seems to actually go >> >>> consistently about 3% faster, which is totally mysterious. So I have >> >>> removed it from the two files it is defined in: >> >> >> >> >> >> Thats very strange , I assume this is on a K10 , what kind of sizes are >> >> we talking about >> >> >> >>> >> >>> ifft_mfa_truncate_sqrt2.c >> >>> ifft_truncate_sqrt2.c >> >>> >> >>> As sumdiff_n seems to be defined for all platforms as far back as the >> >>> MPIR 2.1 series (even if it is not explicitly exported), I can just >> >>> extern this in my flint code (and of course it won't be a problem in >> >>> mpir). >> >>> >> >>> So the fft should build on all systems now. >> >>> >> >>> Bill. >> >>> >> >>> >> >> -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.