On 4 December 2010 00:52, Jason <ja...@njkfrudils.plus.com> wrote: > Hi > > Heres the first lot of new assembler code for the x64 (in trunk) > > popcount/hamdist are not terribly useful for MPIR , but they do offer a simple > way to practice stuff. > > K8 popcount was 5.5c/l with 2way unroll now 4.66c/l with 3way > K8 hamdist was 5.5c/l with 2way unroll now 5.0c/l with 3way > > The above was just practice for the core2 version which uses SSE , if I'm > going to try to use SSE for anything other than trivial copys/logic then I > need the practice. > > core2/penryn popcount was 6.5c/l with 4way unroll now 2.75c/l with 4way > > The hamdist shows similar improvements , just have to write the horrible SSE > alignment stuff , yuck.. > > K10 popcount was 1.5c/l with 4way unroll now 1.0c/l with 2way > K10 hamdist was 1.9c/l with 4way unroll now 1.5c/l with 4way
Wow, sounds like a lot of great work Jason. > > The above are "optimal" , although for very large unrolls 28way(10way is > probably the minimum) we could get down to 0.87c/l for popcount because we do > have a spare ALU slot. > The above is more interesting than that as it's very similar to the limits of > addmul Not sure what you mean. Do you mean that the point at which it drops to the lower time is the same as for addmul. > > Should be able to get the nehalem to run at the same speed as the K10 but so > far a conflict of scheduling with the jcc inst is preventing this. > Best so far(and current code in trunk) is 1.25 and 1.9 c/l > > I'll see if I can come up with the Windows version tomorrow. > Sounds good. I'm not sure where these get used, but if it gives you practice for other things then its pretty valuable. Bill. -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-de...@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.