Hi , now I have more accurate timings here are the real changes made
from mpir-2.2 to the upcoming mpir-2.3

popcount 1310 to 1066 ie 1.25c/l at 4-way to 1.0c/l at 6-way
hamdist 2036 to 2040 ie 2.0c/l at 4-way to 2.0c/l at 2-way
mul_1 3779 to 3610 ie 3.75c/l at 4-way to 3.563c/l at 3-way
mul_2 7961 to 7172 ie 7.9c/l at 3-way to 7.1 at 3-way

The popcount and hamdist are as before , but the mul_1,2 are showing
some bit rot , in light of the better timings I'll give them another
go.

Jason


On Dec 22, 5:57 pm, Jason <ja...@njkfrudils.plus.com> wrote:
> On Wednesday 22 December 2010 10:15:39 Cactus wrote:
>
>
>
> > On Dec 22, 9:08 am, Jason <ja...@njkfrudils.plus.com> wrote:
> > > Hi
>
> > > In trunk there is a new mpn_mul_2 for the nehalem/westmere , the old one
> > > ran at (a measured) 7.59c/l and the new one at 6.84c/l  , about 10%
> > > speed-up , the optimal would be 6.0c/l (bound by add latency) this would
> > > give a measured 5.87c/l . I'm going to try adding a cpuid serializing
> > > instruction in our timing code to see if we can get proper timing for
> > > the nehalem. Note: This new function is VERY sensitive to the exact
> > > feed-in and wind-down code , it's a right old PITA . If only I could put
> > > the pipelines in a known state at the start of the function , or time it
> > > with the exact feed-in code.
>
> > Hi Jason,
>
> > I have added it to the nehalem x64 builds on Windows.
>
> > Of course, the feed in/out code is different so its quite possible
> > that this will interfere with the optimisation.
>
> >     Brian
>
> It seems we allready use cpuid to serialize , however turning off turbo-boost
> in the bios solves it.
>
> with turbo boost
>
> ./speed  -c -s 1000 mpn_add_n
> overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.58 MHz
>             mpn_add_n
> 1000          1933.00
>
> and with turbo-boost turned off
>
> ./speed  -c -s 1000 mpn_add_n
> overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU freq
> 2664.58 MHz
>             mpn_add_n
> 1000          2030.00
>
> clearly rdtsc counts the base clock , and if one core if boosted rdtsc still
> counts the base clock , giving impossible answers , I'll think I'll leave my
> bios with turbo-boost switched off , accurate answers are far more important
> than a 5% speedup.
>
> Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to