Hi

The new mul_1 for nehalem is in trunk , this does run at a measured 3390 ie 
3.333c/l . Note it is very sensitive to feed-in/wind-down code , notice the 
spurious instructions needed :(

Jason


On Friday 24 December 2010 04:19:11 Jason wrote:
> I should of said the below is for the nehalem/westmere only.
> 
> On Friday 24 December 2010 04:18:26 jason wrote:
> > Hi , now I have more accurate timings here are the real changes made
> > from mpir-2.2 to the upcoming mpir-2.3
> > 
> > popcount 1310 to 1066 ie 1.25c/l at 4-way to 1.0c/l at 6-way
> > hamdist 2036 to 2040 ie 2.0c/l at 4-way to 2.0c/l at 2-way
> > mul_1 3779 to 3610 ie 3.75c/l at 4-way to 3.563c/l at 3-way
> > mul_2 7961 to 7172 ie 7.9c/l at 3-way to 7.1 at 3-way
> > 
> > The popcount and hamdist are as before , but the mul_1,2 are showing
> > some bit rot , in light of the better timings I'll give them another
> > go.
> > 
> > Jason
> > 
> > On Dec 22, 5:57 pm, Jason <ja...@njkfrudils.plus.com> wrote:
> > > On Wednesday 22 December 2010 10:15:39 Cactus wrote:
> > > > On Dec 22, 9:08 am, Jason <ja...@njkfrudils.plus.com> wrote:
> > > > > Hi
> > > > > 
> > > > > In trunk there is a new mpn_mul_2 for the nehalem/westmere , the
> > > > > old one ran at (a measured) 7.59c/l and the new one at 6.84c/l  ,
> > > > > about 10% speed-up , the optimal would be 6.0c/l (bound by add
> > > > > latency) this would give a measured 5.87c/l . I'm going to try
> > > > > adding a cpuid serializing instruction in our timing code to see
> > > > > if we can get proper timing for the nehalem. Note: This new
> > > > > function is VERY sensitive to the exact feed-in and wind-down code
> > > > > , it's a right old PITA . If only I could put the pipelines in a
> > > > > known state at the start of the function , or time it with the
> > > > > exact feed-in code.
> > > > 
> > > > Hi Jason,
> > > > 
> > > > I have added it to the nehalem x64 builds on Windows.
> > > > 
> > > > Of course, the feed in/out code is different so its quite possible
> > > > that this will interfere with the optimisation.
> > > > 
> > > >     Brian
> > > 
> > > It seems we allready use cpuid to serialize , however turning off
> > > turbo-boost in the bios solves it.
> > > 
> > > with turbo boost
> > > 
> > > ./speed  -c -s 1000 mpn_add_n
> > > overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU
> > > freq 2664.58 MHz
> > > 
> > >             mpn_add_n
> > > 
> > > 1000          1933.00
> > > 
> > > and with turbo-boost turned off
> > > 
> > > ./speed  -c -s 1000 mpn_add_n
> > > overhead 6.00 cycles, precision 1000000 units of 3.75e-10 secs, CPU
> > > freq 2664.58 MHz
> > > 
> > >             mpn_add_n
> > > 
> > > 1000          2030.00
> > > 
> > > clearly rdtsc counts the base clock , and if one core if boosted rdtsc
> > > still counts the base clock , giving impossible answers , I'll think
> > > I'll leave my bios with turbo-boost switched off , accurate answers are
> > > far more important than a 5% speedup.
> > > 
> > > Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to