In my code I don't alias operands with sumdiff. However I do require
it to deal with the case n = 0.

So do you know what the difference in timings between sumdiff and an
add + sub is on K10 when everything is in cache?

Bill.

On 10 January 2012 18:52, Jason <ja...@njkfrudils.plus.com> wrote:
> On Tuesday 10 January 2012 14:23:35 Bill Hart wrote:
>> I just tried removing mpn_sumdiff_n references from my code, and this
>> slowed it down substantially. So this function is really important for
>> the speed of the FFT.
>
> Given that the speedup of sumdiff is pretty small I would guess that the 
> measured speedup is because of better use
> of the L1 cache. Why dont we just document sumdiff in the manual is export it 
> properly. The only question is whether
> to allow both destinations to alias both sources. The asm versions can handle 
> this case no problem and the existing C implementation can
> althouth it has to allocate space to do so. The new simpler `C implementation 
> can't handle this case. As the measured speedup is from the better use
> of the L1 cache this is a pain. However if the define 
> HAVE_NATIVE_mpn_sumdiff_t is set then we know we can , so like the case where 
> sumdiff was
> "not present" on some arch we can just keep the present code(the present new 
> fft code) and just use the HAVE_NATIVE_mpn_sumdiff where we have this special
> case , otherwise we must use a temp var and sumdiff or even separate add and 
> sub (as that how it's writen now)
>
>>
>> Unfortunately it is not exported by MPIR and even though it is defined
>> for all processors, it is mpn_sumdiff_n in some libraries and
>> _gmpn_sumdiff_n in others and __gmpn_sumdiff_n in others. So this is a
>> total pain in the neck. I'm not sure what the best solution is.
>>
>> Bill.
>>
>> On 9 January 2012 18:01, Bill Hart <goodwillh...@googlemail.com> wrote:
>> > I wouldn't worry about it. It is possible I overwrote my timings file
>> > and that the times are not affected after all.
>> >
>> > On 9 January 2012 17:56, Jason <ja...@njkfrudils.plus.com> wrote:
>> >> On Sunday 08 January 2012 11:30:05 Bill Hart wrote:
>> >>> I decided to try the FFT without addsub_n and it seems to actually go
>> >>> consistently about 3% faster, which is totally mysterious. So I have
>> >>> removed it from the two files it is defined in:
>> >>
>> >>
>> >> Thats very strange , I assume this is on a K10 , what kind of sizes are 
>> >> we talking about
>> >>
>> >>>
>> >>> ifft_mfa_truncate_sqrt2.c
>> >>> ifft_truncate_sqrt2.c
>> >>>
>> >>> As sumdiff_n seems to be defined for all platforms as far back as the
>> >>> MPIR 2.1 series (even if it is not explicitly exported), I can just
>> >>> extern this in my flint code (and of course it won't be a problem in
>> >>> mpir).
>> >>>
>> >>> So the fft should build on all systems now.
>> >>>
>> >>> Bill.
>> >>>
>> >>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to