On Tuesday 10 January 2012 19:06:57 Bill Hart wrote:
> In my code I don't alias operands with sumdiff. However I do require
> it to deal with the case n = 0.
> 

The asm version does handle n=0 purely by chance , a fair number of mpn 
functions dont

> So do you know what the difference in timings between sumdiff and an
> add + sub is on K10 when everything is in cache?

add+sub=3.0c and sumdiff is 2.75c


> 
> Bill.
> 
> On 10 January 2012 18:52, Jason <ja...@njkfrudils.plus.com> wrote:
> > On Tuesday 10 January 2012 14:23:35 Bill Hart wrote:
> >> I just tried removing mpn_sumdiff_n references from my code, and this
> >> slowed it down substantially. So this function is really important for
> >> the speed of the FFT.
> >
> > Given that the speedup of sumdiff is pretty small I would guess that the 
> > measured speedup is because of better use
> > of the L1 cache. Why dont we just document sumdiff in the manual is export 
> > it properly. The only question is whether
> > to allow both destinations to alias both sources. The asm versions can 
> > handle this case no problem and the existing C implementation can
> > althouth it has to allocate space to do so. The new simpler `C 
> > implementation can't handle this case. As the measured speedup is from the 
> > better use
> > of the L1 cache this is a pain. However if the define 
> > HAVE_NATIVE_mpn_sumdiff_t is set then we know we can , so like the case 
> > where sumdiff was
> > "not present" on some arch we can just keep the present code(the present 
> > new fft code) and just use the HAVE_NATIVE_mpn_sumdiff where we have this 
> > special
> > case , otherwise we must use a temp var and sumdiff or even separate add 
> > and sub (as that how it's writen now)
> >
> >>
> >> Unfortunately it is not exported by MPIR and even though it is defined
> >> for all processors, it is mpn_sumdiff_n in some libraries and
> >> _gmpn_sumdiff_n in others and __gmpn_sumdiff_n in others. So this is a
> >> total pain in the neck. I'm not sure what the best solution is.
> >>
> >> Bill.
> >>
> >> On 9 January 2012 18:01, Bill Hart <goodwillh...@googlemail.com> wrote:
> >> > I wouldn't worry about it. It is possible I overwrote my timings file
> >> > and that the times are not affected after all.
> >> >
> >> > On 9 January 2012 17:56, Jason <ja...@njkfrudils.plus.com> wrote:
> >> >> On Sunday 08 January 2012 11:30:05 Bill Hart wrote:
> >> >>> I decided to try the FFT without addsub_n and it seems to actually go
> >> >>> consistently about 3% faster, which is totally mysterious. So I have
> >> >>> removed it from the two files it is defined in:
> >> >>
> >> >>
> >> >> Thats very strange , I assume this is on a K10 , what kind of sizes are 
> >> >> we talking about
> >> >>
> >> >>>
> >> >>> ifft_mfa_truncate_sqrt2.c
> >> >>> ifft_truncate_sqrt2.c
> >> >>>
> >> >>> As sumdiff_n seems to be defined for all platforms as far back as the
> >> >>> MPIR 2.1 series (even if it is not explicitly exported), I can just
> >> >>> extern this in my flint code (and of course it won't be a problem in
> >> >>> mpir).
> >> >>>
> >> >>> So the fft should build on all systems now.
> >> >>>
> >> >>> Bill.
> >> >>>
> >> >>>
> >>
> >>
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to