|
Hi,
Can't
say that I'm a great expert on this subject, but I've dug into this in the past.
As far
as I recall the Strassen Algorithm does only work with square matrices as you
say. Also from dim recollections of the past the speed improvement of the
Strassen algorithm really isn't that great relative to the complexity of
implementing it (I think it scales as O(n^log2(7)) which is about O(n^2.8)
versus the brute-force algorithm's O(n^3). This is a dramatically poorer
complexity improvement than (say) switching between a bubblesort and a
quicksort. I do recall reading about algorithms by Coppersmith and Winograd
(well I have a note in an old log book that I've dug up!) that was a touch
more efficient, but again not vastly so.
To be
honest I was never convinced that the number of additions and multiplications
was really the limiting factor in such large matrix multiplications. My
suspicion was that the main performance limitation was the fact that the
matrices in question became far too large to fit into the cache (or indeed the
installed memory!), and consequently the brute force algorithm's performance was
being limited much more by hardware than by any (overwhelming) inefficiency. My
feeling was that if one could organise access of the data, so that all the
values being used were prefetched into the cache one would gain a far more
substantial performance boost than moving away from the brute-force approach. In
the end I gave up and used the Intel Math Kernel Library (it was free back
then). This implements standard BLAS routines in a manner which always
exceeded the performance of my code. For what I wanted the peformance was
acceptable, although I don't recall ever needing matrices as large as
yours.
The
more recent versions of the Intel Math Kernel Library aren't free (although I
think there's a free trial version so you can see if it gives you adequate
performance). But the newer versions are able to scale across multiple
processors in a machine (matrix multiplication is a very paralellisable
operation) and include optimised versions for all flavours of Pentium
processors, Itanium processors and also have been ported to
Linux.
Hope
that helps
Daniel
[Daniel Robinson] -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of Juan Carlos Sent: 25 October 2004 18:14 To: [EMAIL PROTECTED] Subject: [msvc] Matrix multiplication
|
_______________________________________________ msvc mailing list [EMAIL PROTECTED] See http://beginthread.com/mailman/listinfo/msvc_beginthread.com for subscription changes, and list archive.
