On Tuesday 05 July 2011 12:23:34 jason wrote: > On Jul 4, 8:48 pm, Jason <ja...@njkfrudils.plus.com> wrote: > > On Monday 04 July 2011 20:21:46 Cactus wrote: > > > Looks good! > > > > > > I notice that there is new k8 assembler. Is this going to be repeated > > > for the Intel architectures? > > > > Yep , the current code is already an improvement on Intel , but I can do > > better eg the multiple carry handling. > > > > > Is it stable enough to do conversion for > > > Windows? > > > > I'd leave it for a week or so , I'll simplify the feed-in code , the > > wind- down(as rdx is fixed) , and merge the small loops. We did lose > > some speed in the inner loop so perhaps a longer/more sophisticated > > search may find something better. > > One interesting "feature" , the K10 version uses popcount :) > > > > > Brian > > The AMD versions are ready for conversion , note that they are all > very similar > > Jason
After running on various cpu's we get these results cycles per word just with kara ld/st latency op cpu add addadd add bnd bound bound K8/k10 4.5 3.7 2.35 2.0 2.25 2.041666 K102 4.5 3.5 2.3 2.0 2,25 2.041666 core2 6.0 5.5 5.0 3.0 4.125 2.583333 penryn 6.0 5,8 4.9 3.0 4.125 2.583333 nehalem 6.0 5.5 4.7 3.0 4.125 2.583333 westmere 6.0 3.0 4.125 2.583333 sandybr 4.8 --- 4.0 2.625 2.583333 bobcat 7.5 6.2 3.75 2.25 3.0625 atom 12.3 ----- 8.0 3.75 Fairly sure the core2/nehalem can be improved. The latency bound comes from a false dependence , we could improve this by storing one of the carry flags on the stack (setc,bt), this very slightly increase the ld/st bound and reduces the latency bound eg on K8/k10/K102 ld/st is 2.125 latency is 1.5 op is 2.041666 on core2..westmere ld/st is 3.125 latency is 2.75 op is 2.58333 I'll post some results on the mul speedups later Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.