Martin,

That's all excellent news!! So on the c2d we are caning magma. But we
should try and figure out if your magma version is optimised for c2d
or for amd64, since that will make a big difference. Is your machine
some kind of 64 bit Intel OSX machine? I don't see a specific core 2
version of Magma on their current list. Of course if you just had a
generic linux x86 version of Magma, that would be much slower than
optimal.

It's amazing how much difference the SSE makes on your machine. The
AMD does essentially use its MMX or SSE hardware to read in cache
lines I believe, so basically unless you are doing something requiring
lots of wide arithmetic/logic, you aren't going to get anything more
out of the chip.

I look forward to seeing the new code now that you've cleaned it up.

I'm going to try and figure out what GAP does, in case there's any
ideas we missed. It's surely old code, but there might be lots of
interesting things in there.

Anyhow, who would have thought that one would see 1.22s for a
10000x10000 matrix multiply. That's pretty exciting.

Bill.

On 19 May, 21:39, Martin Albrecht <[EMAIL PROTECTED]>
wrote:
> On Monday 19 May 2008, Bill Hart wrote:
>
> > You seemed to be getting up to 8% at points there. That's definitely
> > worth it. I'll be interested to see this evening how it comes out,
> > though I recommend optimising my combine3 function (which I suppose
> > should now be combine8), even including it inline rather than have it
> > in a separate file.
>
> > Of course on the Opteron, SSE should be switched off, since it is
> > definitely slower by about 5%-10% even with careful optimisation.
>
> > Bill.
>
> Okay, I added  SSE2 support again and the timings are pretty good on the C2D:
>
> Dimension               Old             New
> 10000 x 10000   2.270           1.720
> 16384 x 16384   9.130           6.760
> 20000 x 20000   16.110          12.310
> 32000 x 32000    64.340 50.690
>
> Throwing parallelism in the mix (still lame implementation):
>
> Dimension               Old             New
> 10000 x 10000   1.470           1.220
> 16384 x 16384   5.540           4.390
> 20000 x 20000   11.800          8.580
> 32000 x 32000   40.040          32.810
>
> Btw. Mike Hansen pointed out on IRC that GAP has a pretty fast implementation
> of matrix multiplication too:
>
> GAP4, Version: 4.4.10 of 02-Oct-2007, x86_64-unknown-linux-gnu-gcc
> gap> A := RandomMat(10000,10000,GF(2));
> <a 10000x10000 matrix over GF2>
> gap> B := RandomMat(10000,10000,GF(2));
> <a 10000x10000 matrix over GF2>
> gap> C := A*B;
> <a 10000x10000 matrix over GF2>
> gap> time;
> 5951
>
> The unit here is ms so this takes 6 seconds. However, the generation of random
> matrices takes forever. Mike also pointed out that GAP is twice as fast for
> the  example he tried than the current Sage code (i.e. the code before the
> improvements discussed in this thread).
>
> On sage.math things don't improve as expected:
>
> sage: A = random_matrix(GF(2),32000,32000)
> sage: B = random_matrix(GF(2),32000,32000)
> sage: time C = A._multiply_strassen(B,cutoff=2^11)
> CPU times: user 121.69 s, sys: 3.93 s, total: 125.62 s
> Wall time: 125.62
>
> This was 114.620 before.
>
> Martin
>
> --
> name: Martin Albrecht
> _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> _www:http://www.informatik.uni-bremen.de/~malb
> _jab: [EMAIL PROTECTED]
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to