On Thursday 19 February 2009 01:05:38 Jason Martin wrote:
> On Wed, Feb 18, 2009 at 8:05 PM,  <ja...@njkfrudils.plus.com> wrote:
> > On Thursday 19 February 2009 00:49:16 Jason Martin wrote:
> >> On Wed, Feb 18, 2009 at 7:13 PM,  <ja...@njkfrudils.plus.com> wrote:
> >> > On Wednesday 18 February 2009 22:03:43 Mariah wrote:
> >> >> gmp-4.2.4   mpir-0.9.0
> >> >>
> >> >> 2241.9      2251         cicero (pentium4-pc-linux-gnu)
> >> >> 3371.5      3369.3      cleo (ia64-unknown-linux-gnu)
> >> >> 6024.5      7437.8      eno (core2-unknown-linux-gnu)
> >> >> 6022.2      7387.1      fulvia (core2-pc-solaris2.10)
> >> >> 3367.8      3369.5      iras (ia64-unknown-linux-gnu)
> >> >> 1341.3      1343.6      mark (ultrasparc3-sun-solaris2.10)
> >> >> 6100         7421.1      menas (core2-unknown-linux-gnu)
> >> >>
> >> >> Mariah
> >> >
> >> > K10 crushes core-2 (intel fanbois hide their heads in shame :)
> >> >
> >> > gmp-4.2.4       mpir-0.9.0      r1614-k8-branch
> >> > 6014            7379            10118                   box1
> >> > (k8-unknown-linux-gnu) 1.8Ghz 9301            11659           15514
> >> >             cuda1 (k10-unknown-linux-gnu) 2.6Ghz
> >>
> >> I don't think that the core2 can get much faster... the addmul (and
> >> friends) are running just shy of 4 cycles/limb which is the max
> >> throughput rate for the 64-bit multiply instruction on core2.  I'm
> >> appropriately hiding in shame :-)
> >
> > A lot of the speed comes from reducing the overhead in mul_basecase. The
> > first mul_basecase I did that used a 2.5c/l addmul loop gave a score of
> > 8200 .
> >
> > How does a 20x20 mul_basecase compair with a 400 limb addmul_1 ?
> >
> > for K8 we have
> > 1110 cycles for 20x20
> > 1031 cycles for 400 limb addmul
> > a 7.6% overhead for basecase over addmul
>
> That code can probably be dropped straight into the core2 code, can't
> it?  I haven't looked at your mul_basecase carefully, but as long as
> you aren't using the "inc" or "dec" functions too much, then it should
> support core 2 as well as K10.
>
> --jason

I think a "inc" once per row , but i'm sure we can replace with add $1,xxx

I still got to check that they are all up to speed on the K10 , (i'm sure they 
are but...) . If someone can give me access to a core-2 machine , I can go 
thru all the K8-asm code and see what is worth doing.


>
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to