On Saturday 23 July 2011 14:18:09 Jason wrote: > On Friday 22 July 2011 23:11:29 Jason wrote: > > On Friday 22 July 2011 17:39:55 Jason wrote: > > > On Friday 22 July 2011 12:36:21 Jason wrote: > > > > On Friday 22 July 2011 12:25:42 Bill Hart wrote: > > > > > That's fantastic! Thanks for all your hard work on these Jason. > > > > > > > > > > Bill. > > > > > > > > > > On 22 July 2011 11:13, jason <ja...@njkfrudils.plus.com> wrote: > > > > > > Hi > > > > > > > > > > > > New assembler for the nehalem mpn_addadd mpn_addsub mpn_subadd , > > > > > > used to run at 3.5c/l now at 3.0c/l therefore optimal. I'll check > > > > > > out how they run on the other intel chips. > > > > > > > > it's also an improvement on core2/penryn and nearly optimal , > > > > probably just needs a shuffle. Sandybridge doesn't benefit though. > > > > > > > > > > Note mpn_subadd really should be called mpn_subsub , I'll change > > > > > > it later. > > > > > > > > > > > > Jason > > > > > > > > > > > > -- > > > > > > You received this message because you are subscribed to the > > > > > > Google Groups "mpir-devel" group. To post to this group, send > > > > > > email to mpir-devel@googlegroups.com. To unsubscribe from this > > > > > > group, send email to mpir-devel+unsubscr...@googlegroups.com. > > > > > > For more options, visit this group at > > > > > > http://groups.google.com/group/mpir-devel?hl=en. > > > > > > I tried to do a mpn_addaddadd ie x=y+z+u+v but on Intel chips the > > > scheduler cant really cope with it , also with 5 pointer you get so > > > many L1 data cache bank conflicts that the code runs at different > > > speeds for the relative differences between the pointers mod 64 . But > > > if we are many using it for toom then we could perhaps guarantee the > > > relative differences. On the AMD chips I have a strange problem with > > > my optimizer where it reports silly numbers for some functions ie > > > sumdiff addadd , no idea why it's happening , I even reverted to an > > > earier svn version where I found the original fast addadd code , but > > > it still gave silly figures , but karaadd was fine ? > > > > > > Jason > > > > New mpn_sumdiff for the nehalem , didn't have before so we can say it > > would of run at 4.0c/w but now is 3.6c/w (lost a lttle bit with the > > feedin) , this code also benefits the core2 but not penryn or > > sandybridge , probably just another trivial shuffle needed :) > > > > Jason > > I've shuffled the sumdiff for the core2 it's a bit faster at 3.5c/w > > Jason
I've shuffled the sumdiff again for penryn and it runs at 3.7c/w and for netburst/atom I'll just choose the fastest from what we already have (just before release) , the only one left to do is westmere on the gcc farm , I dont know how loaded it is , so I might not be able to make any meaningful comparisons. Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.