That's looking good. Would you like me to run it on an unburdened
opteron to see how it goes? If you like you can send me a tarball and
I'll try it out.

I think our best bet for a significant improvement now is the idea of
using two Gray tables of half the size simultaneously. I also realised
it possibly improves the cache performance for the A matrix too.

I was casually wondering whether Magma might use a highly optimised
Winograd's algorithm instead of the naive algorithm. But over GF2 I
think it probably actually takes longer, since it basically replaces
n^2 full length scalar multiplies by n^2 half length ones and 2*n^2
half row additions, plus a pile of other overhead.

Bill.

On 17 May, 20:32, Martin Albrecht <[EMAIL PROTECTED]>
wrote:
> On Saturday 17 May 2008, Martin Albrecht wrote:
>
> > > I think a better idea would be to explicitly force all matrices and
> > > all rows to be 128 bit aligned if the matrices are wide enough to
> > > benefit from SSE2, Then the combine function can always use SSE2 and
> > > there will be no need to check for alignment.
>
> > That doesn't seem to make a noticeable difference for me (on C2D). However,
> > I realised that the multiplications where the target matrix is a real
> > matrix rather than a window (which has bad data locality). Copying
> > everything over seems not like a good idea but it at least indicates an
> > area for improvements.
>
> Okay, if I only copy when we crossover to M4RM then the memory overhead is
> constant (~ cutoff^2) and the performance still improves.
>
> Old: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
> Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
> 10,000 x 10,000         2.920                           3.610
> 16,384 x 16,384         11.140                          12.120
> 20,000 x 20,000         20.370                          24.390
> 32,000 x 32,000 74.290                          94.910
>
> New: 64-bit Debian/GNU Linux, 2.33Ghz Core2Duo
> Matrix Dimension        Magma 2.14-13 (64-bit)  M4RI-20080517 (64-bit)
> 10,000 x 10,000         2.920                           2.990
> 16,384 x 16,384         11.140                          11.750
> 20,000 x 20,000         20.370                          21.180
> 32,000 x 32,000 74.290                          86.570
>
> On Opteron things don't look this way, but I think sage.math is pretty heavily
> used right now such that my benchmarks there are not very telling.
>
> Martin
>
> --
> name: Martin Albrecht
> _pgp:http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
> _www:http://www.informatik.uni-bremen.de/~malb
> _jab: [EMAIL PROTECTED]
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to