On Thursday 19 February 2009 01:05:38 Jason Martin wrote: > On Wed, Feb 18, 2009 at 8:05 PM, <ja...@njkfrudils.plus.com> wrote: > > On Thursday 19 February 2009 00:49:16 Jason Martin wrote: > >> On Wed, Feb 18, 2009 at 7:13 PM, <ja...@njkfrudils.plus.com> wrote: > >> > On Wednesday 18 February 2009 22:03:43 Mariah wrote: > >> >> gmp-4.2.4 mpir-0.9.0 > >> >> > >> >> 2241.9 2251 cicero (pentium4-pc-linux-gnu) > >> >> 3371.5 3369.3 cleo (ia64-unknown-linux-gnu) > >> >> 6024.5 7437.8 eno (core2-unknown-linux-gnu) > >> >> 6022.2 7387.1 fulvia (core2-pc-solaris2.10) > >> >> 3367.8 3369.5 iras (ia64-unknown-linux-gnu) > >> >> 1341.3 1343.6 mark (ultrasparc3-sun-solaris2.10) > >> >> 6100 7421.1 menas (core2-unknown-linux-gnu) > >> >> > >> >> Mariah > >> > > >> > K10 crushes core-2 (intel fanbois hide their heads in shame :) > >> > > >> > gmp-4.2.4 mpir-0.9.0 r1614-k8-branch > >> > 6014 7379 10118 box1 > >> > (k8-unknown-linux-gnu) 1.8Ghz 9301 11659 15514 > >> > cuda1 (k10-unknown-linux-gnu) 2.6Ghz > >> > >> I don't think that the core2 can get much faster... the addmul (and > >> friends) are running just shy of 4 cycles/limb which is the max > >> throughput rate for the 64-bit multiply instruction on core2. I'm > >> appropriately hiding in shame :-) > > > > A lot of the speed comes from reducing the overhead in mul_basecase. The > > first mul_basecase I did that used a 2.5c/l addmul loop gave a score of > > 8200 . > > > > How does a 20x20 mul_basecase compair with a 400 limb addmul_1 ? > > > > for K8 we have > > 1110 cycles for 20x20 > > 1031 cycles for 400 limb addmul > > a 7.6% overhead for basecase over addmul > > That code can probably be dropped straight into the core2 code, can't > it? I haven't looked at your mul_basecase carefully, but as long as > you aren't using the "inc" or "dec" functions too much, then it should > support core 2 as well as K10. > > --jason
I think a "inc" once per row , but i'm sure we can replace with add $1,xxx I still got to check that they are all up to speed on the K10 , (i'm sure they are but...) . If someone can give me access to a core-2 machine , I can go thru all the K8-asm code and see what is worth doing. > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---