There's sandybridge addsub to do and that will finish the asm part of toom evaluation for +-1 , I'll write the C bit , and then do the asm for evaluation at +-2 +-1/2 +-4 etc , this will consist of inclsh's/incrsh's and then perhaps do a toom23 interpolation combined with a reconstitution if it seems worth while(karasub/add was the toom22 version of this). That will be all for now on toom from me. Then I want to try to improve our *_basecase's , that will probably all I get to do before our next release. How's the FFT going? Let me know what linear asm stuff might be needed so I can start thinking about it , even if I write no code.
On Saturday 23 July 2011 18:25:57 Bill Hart wrote: > There is a way in Ubuntu. It may be similar in Debian. However, I've > forgotten what the command is and I think you need sudo powers. Possible skynet may be able to accommodate this if the kernel has the right Capability handling in it , ie you can restrict a particular sudo to only one aspect of the system. > > Bill. > > On 23 July 2011 16:32, Jason <ja...@njkfrudils.plus.com> wrote: > > On Saturday 23 July 2011 15:42:08 Jason wrote: > >> On Saturday 23 July 2011 14:18:09 Jason wrote: > >> > On Friday 22 July 2011 23:11:29 Jason wrote: > >> > > On Friday 22 July 2011 17:39:55 Jason wrote: > >> > > > On Friday 22 July 2011 12:36:21 Jason wrote: > >> > > > > On Friday 22 July 2011 12:25:42 Bill Hart wrote: > >> > > > > > That's fantastic! Thanks for all your hard work on these > >> > > > > > Jason. > >> > > > > > > >> > > > > > Bill. > >> > > > > > > >> > > > > > On 22 July 2011 11:13, jason <ja...@njkfrudils.plus.com> wrote: > >> > > > > > > Hi > >> > > > > > > > >> > > > > > > New assembler for the nehalem mpn_addadd mpn_addsub > >> > > > > > > mpn_subadd , used to run at 3.5c/l now at 3.0c/l therefore > >> > > > > > > optimal. I'll check out how they run on the other intel > >> > > > > > > chips. > >> > > > > > >> > > > > it's also an improvement on core2/penryn and nearly optimal , > >> > > > > probably just needs a shuffle. Sandybridge doesn't benefit > >> > > > > though. > >> > > > > > >> > > > > > > Note mpn_subadd really should be called mpn_subsub , I'll > >> > > > > > > change it later. > >> > > > > > > > >> > > > > > > Jason > >> > > > > > > > >> > > > > > > -- > >> > > > > > > You received this message because you are subscribed to the > >> > > > > > > Google Groups "mpir-devel" group. To post to this group, > >> > > > > > > send email to mpir-devel@googlegroups.com. To unsubscribe > >> > > > > > > from this group, send email to > >> > > > > > > mpir-devel+unsubscr...@googlegroups.com. For more options, > >> > > > > > > visit this group at > >> > > > > > > http://groups.google.com/group/mpir-devel?hl=en. > >> > > > > >> > > > I tried to do a mpn_addaddadd ie x=y+z+u+v but on Intel chips the > >> > > > scheduler cant really cope with it , also with 5 pointer you get > >> > > > so many L1 data cache bank conflicts that the code runs at > >> > > > different speeds for the relative differences between the > >> > > > pointers mod 64 . But if we are many using it for toom then we > >> > > > could perhaps guarantee the relative differences. On the AMD > >> > > > chips I have a strange problem with my optimizer where it reports > >> > > > silly numbers for some functions ie sumdiff addadd , no idea why > >> > > > it's happening , I even reverted to an earier svn version where I > >> > > > found the original fast addadd code , but it still gave silly > >> > > > figures , but karaadd was fine ? > >> > > > > >> > > > Jason > >> > > > >> > > New mpn_sumdiff for the nehalem , didn't have before so we can say > >> > > it would of run at 4.0c/w but now is 3.6c/w (lost a lttle bit with > >> > > the feedin) , this code also benefits the core2 but not penryn or > >> > > sandybridge , probably just another trivial shuffle needed :) > >> > > > >> > > Jason > >> > > >> > I've shuffled the sumdiff for the core2 it's a bit faster at 3.5c/w > >> > > >> > Jason > >> > >> I've shuffled the sumdiff again for penryn and it runs at 3.7c/w > >> > >> and for netburst/atom I'll just choose the fastest from what we already > >> have (just before release) , the only one left to do is westmere on the > >> gcc farm , I dont know how loaded it is , so I might not be able to make > >> any meaningful comparisons. > >> > >> Jason > > > > Well gcc20 (westmere) is very lightly loaded at the moment but it has Mhz > > throttling to save power and turbo boost ,which means I can't use it for > > timings. Have to find a software way to temporary turn these off , on my > > machines (nehalem,sandybridge,bobcat) I just turned it off in the bios. > > > > Jason > > > > -- > > You received this message because you are subscribed to the Google Groups > > "mpir-devel" group. To post to this group, send email to > > mpir-devel@googlegroups.com. To unsubscribe from this group, send email > > to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this > > group at http://groups.google.com/group/mpir-devel?hl=en. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.