On Saturday 31 December 2011 17:47:03 Jason wrote: > On Tuesday 27 December 2011 17:27:48 Bill Hart wrote: > > In my FFT I make use of mpn_sumdiff_n and mpn_addsub_n. It seems these > > are not exported even though there are generic C versions. > > > > Also, I see there is no sumdiff_n.as on core2 style machines. Is it > > possible to include mpn_sumdiff_n.c in the library on such machines so > > that it is included unconditionally for all machines? > > > > we would still have the other arches to do ie power,arm etc ,to make it > unconditional addsub needs to allocate some tmp space ,I suppose we could > split the addsub it to various overlap cases this may be possible , but for > sumdiff I dont think it is >
addsub is possible , and so is addadd although the case addadd_n(t,x,y,z) where t=x=y=z requires mul_1(t,x,3) which on core2 and sandybridge the same speed as two adds , dont know about the other arch ,although if we consider this a rare case then it may not be important. sumdiff the only difficult case is when the sum and difference are aliased with the bot the sources , we could exclude this overlap condition? , it would also relax the instruction ordering which would ease up finding faster asm versions > > Is there a reason to not have an assembly optimised version for core2? > > > > I havent found one for core2 or sandybridge which is faster than a separate > add and sub > > > Bill. > > > > > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.