On Saturday 31 December 2011 17:47:03 Jason wrote:
> On Tuesday 27 December 2011 17:27:48 Bill Hart wrote:
> > In my FFT I make use of mpn_sumdiff_n and mpn_addsub_n. It seems these
> > are not exported even though there are generic C versions.
> > 
> > Also, I see there is no sumdiff_n.as on core2 style machines. Is it
> > possible to include mpn_sumdiff_n.c in the library on such machines so
> > that it is included unconditionally for all machines?
> > 
> 
> we would still have the other arches to do ie power,arm etc ,to make it 
> unconditional addsub needs to allocate some tmp space ,I suppose we could 
> split the addsub it to various overlap cases  this may be possible , but for 
> sumdiff I dont think it is
> 

addsub is possible , and so is addadd although the case addadd_n(t,x,y,z) where 
t=x=y=z requires mul_1(t,x,3) which on core2 and sandybridge the same speed as 
two adds , dont know about the other arch ,although if we consider this a rare 
case then it may not be important. sumdiff the only difficult case is when the 
sum and difference are aliased with the bot the sources , we could exclude this 
overlap condition? , it would also relax the instruction ordering which would 
ease up finding faster asm versions  


> > Is there a reason to not have an assembly optimised version for core2?
> >  
> 
> I havent found one for core2 or sandybridge which is faster than a separate 
> add and sub
> 
> > Bill.
> > 
> > 
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to