I should of said the below is for the nehalem/westmere only.

On Friday 24 December 2010 04:18:26 jason wrote:
> Hi , now I have more accurate timings here are the real changes made
> from mpir-2.2 to the upcoming mpir-2.3
> 
> popcount 1310 to 1066 ie 1.25c/l at 4-way to 1.0c/l at 6-way
> hamdist 2036 to 2040 ie 2.0c/l at 4-way to 2.0c/l at 2-way
> mul_1 3779 to 3610 ie 3.75c/l at 4-way to 3.563c/l at 3-way
> mul_2 7961 to 7172 ie 7.9c/l at 3-way to 7.1 at 3-way
> 
> The popcount and hamdist are as before , but the mul_1,2 are showing
> some bit rot , in light of the better timings I'll give them another
> go.
> 
> Jason
> 
> On Dec 22, 5:57 pm, Jason <ja...@njkfrudils.plus.com> wrote:
> > On Wednesday 22 December 2010 10:15:39 Cactus wrote:
> > > On Dec 22, 9:08 am, Jason <ja...@njkfrudils.plus.com> wrote:
> > > > Hi
> > > > 
> > > > In trunk there is a new mpn_mul_2 for the nehalem/westmere , the old
> > > > one ran at (a measured) 7.59c/l and the new one at 6.84c/l  , about
> > > > 10% speed-up , the optimal would be 6.0c/l (bound by add latency)
> > > > this would give a measured 5.87c/l . I'm going to try adding a cpuid
> > > > serializing instruction in our timing code to see if we can get
> > > > proper timing for the nehalem. Note: This new function is VERY
> > > > sensitive to the exact feed-in and wind-down code , it's a right old
> > > > PITA . If only I could put the pipelines in a known state at the
> > > > start of the function , or time it with the exact feed-in code.
> > > 
> > > Hi Jason,
> > > 
> > > I have added it to the nehalem x64 builds on Windows.
> > > 
> > > Of course, the feed in/out code is different so its quite possible
> > > that this will interfere with the optimisation.
> > > 
> > >     Brian
> > 
> > It seems we allready use cpuid to serialize , however turning off
> > turbo-boost in the bios solves it.
> > 
> > with turbo boost
> > 
> > ./speed  -c -s 1000 mpn_add_n
> > overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> > 2664.58 MHz
> > 
> >             mpn_add_n
> > 
> > 1000          1933.00
> > 
> > and with turbo-boost turned off
> > 
> > ./speed  -c -s 1000 mpn_add_n
> > overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> > 2664.58 MHz
> > 
> >             mpn_add_n
> > 
> > 1000          2030.00
> > 
> > clearly rdtsc counts the base clock , and if one core if boosted rdtsc
> > still counts the base clock , giving impossible answers , I'll think
> > I'll leave my bios with turbo-boost switched off , accurate answers are
> > far more important than a 5% speedup.
> > 
> > Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to