Hi The new mul_1 for nehalem is in trunk , this does run at a measured 3390 ie 3.333c/l . Note it is very sensitive to feed-in/wind-down code , notice the spurious instructions needed :(
Jason On Friday 24 December 2010 04:19:11 Jason wrote: > I should of said the below is for the nehalem/westmere only. > > On Friday 24 December 2010 04:18:26 jason wrote: > > Hi , now I have more accurate timings here are the real changes made > > from mpir-2.2 to the upcoming mpir-2.3 > > > > popcount 1310 to 1066 ie 1.25c/l at 4-way to 1.0c/l at 6-way > > hamdist 2036 to 2040 ie 2.0c/l at 4-way to 2.0c/l at 2-way > > mul_1 3779 to 3610 ie 3.75c/l at 4-way to 3.563c/l at 3-way > > mul_2 7961 to 7172 ie 7.9c/l at 3-way to 7.1 at 3-way > > > > The popcount and hamdist are as before , but the mul_1,2 are showing > > some bit rot , in light of the better timings I'll give them another > > go. > > > > Jason > > > > On Dec 22, 5:57 pm, Jason <ja...@njkfrudils.plus.com> wrote: > > > On Wednesday 22 December 2010 10:15:39 Cactus wrote: > > > > On Dec 22, 9:08 am, Jason <ja...@njkfrudils.plus.com> wrote: > > > > > Hi > > > > > > > > > > In trunk there is a new mpn_mul_2 for the nehalem/westmere , the > > > > > old one ran at (a measured) 7.59c/l and the new one at 6.84c/l , > > > > > about 10% speed-up , the optimal would be 6.0c/l (bound by add > > > > > latency) this would give a measured 5.87c/l . I'm going to try > > > > > adding a cpuid serializing instruction in our timing code to see > > > > > if we can get proper timing for the nehalem. Note: This new > > > > > function is VERY sensitive to the exact feed-in and wind-down code > > > > > , it's a right old PITA . If only I could put the pipelines in a > > > > > known state at the start of the function , or time it with the > > > > > exact feed-in code. > > > > > > > > Hi Jason, > > > > > > > > I have added it to the nehalem x64 builds on Windows. > > > > > > > > Of course, the feed in/out code is different so its quite possible > > > > that this will interfere with the optimisation. > > > > > > > > Brian > > > > > > It seems we allready use cpuid to serialize , however turning off > > > turbo-boost in the bios solves it. > > > > > > with turbo boost > > > > > > ./speed -c -s 1000 mpn_add_n > > > overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU > > > freq 2664.58 MHz > > > > > > mpn_add_n > > > > > > 1000 1933.00 > > > > > > and with turbo-boost turned off > > > > > > ./speed -c -s 1000 mpn_add_n > > > overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU > > > freq 2664.58 MHz > > > > > > mpn_add_n > > > > > > 1000 2030.00 > > > > > > clearly rdtsc counts the base clock , and if one core if boosted rdtsc > > > still counts the base clock , giving impossible answers , I'll think > > > I'll leave my bios with turbo-boost switched off , accurate answers are > > > far more important than a 5% speedup. > > > > > > Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-de...@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.