On Thursday 19 February 2009 00:49:16 Jason Martin wrote: > On Wed, Feb 18, 2009 at 7:13 PM, <ja...@njkfrudils.plus.com> wrote: > > On Wednesday 18 February 2009 22:03:43 Mariah wrote: > >> gmp-4.2.4 mpir-0.9.0 > >> > >> 2241.9 2251 cicero (pentium4-pc-linux-gnu) > >> 3371.5 3369.3 cleo (ia64-unknown-linux-gnu) > >> 6024.5 7437.8 eno (core2-unknown-linux-gnu) > >> 6022.2 7387.1 fulvia (core2-pc-solaris2.10) > >> 3367.8 3369.5 iras (ia64-unknown-linux-gnu) > >> 1341.3 1343.6 mark (ultrasparc3-sun-solaris2.10) > >> 6100 7421.1 menas (core2-unknown-linux-gnu) > >> > >> Mariah > > > > K10 crushes core-2 (intel fanbois hide their heads in shame :) > > > > gmp-4.2.4 mpir-0.9.0 r1614-k8-branch > > 6014 7379 10118 box1 > > (k8-unknown-linux-gnu) 1.8Ghz 9301 11659 15514 > > cuda1 (k10-unknown-linux-gnu) 2.6Ghz > > I don't think that the core2 can get much faster... the addmul (and > friends) are running just shy of 4 cycles/limb which is the max > throughput rate for the 64-bit multiply instruction on core2. I'm > appropriately hiding in shame :-) >
A lot of the speed comes from reducing the overhead in mul_basecase. The first mul_basecase I did that used a 2.5c/l addmul loop gave a score of 8200 . How does a 20x20 mul_basecase compair with a 400 limb addmul_1 ? for K8 we have 1110 cycles for 20x20 1031 cycles for 400 limb addmul a 7.6% overhead for basecase over addmul > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---