For big matrices, using all the caches well is very important.  The
usual approach is block tiling as you say.  You want two blocks to fit
nicely in the L1 cache.  The most highly optimized schemes will have a
hierarchy of tiles where two tiles at the second level will fit in the
L2 cache, etc. Additional levels can be based on memory interfaces,
interleaving, page size, and even cylinder size on disk (for really
huge matrices). The idea is _never_ to generate more cache misses than
you need to. A miss causes a factor of 10 to 10000 performance
multiple on that operation.

Multiplying within the innermost blocks should also consider prefetch:
you'll get best performance touching locations in contiguous ascending
order.  The inner loops are

for i = 1 to ma
  for j = 1 to nb
    for k = 1 to na
      r[i,j] += a[i,k] * b[k,j]

This ignores that r[i,j] needs to be zeroed out somewhere.  But with
this assumption, the loops can be juxtaposed in any order without
changing the result. You should explore the 6 possible orderings for
the best one.  Of course you also have to zero out the sums in the
best possible manner.

FWIW, a GPU will normally outperform a general purpose CPU with ease
on this problem.  Since even cell phones are getting GPUs these days,
tweaking single-processor matrix code is a dying art.

Cheers.

On Feb 27, 6:57 pm, Arun Vishwanathan <aaron.nar...@gmail.com> wrote:
> Hi all,
>
> We have this challenge to make the fastest executing serial matrix
> multiplication code. I have tried using matrix transpose( in C for row
> major ) and loop unrolling.I was able to obtain little speedup. Does anyone
> have any hints/papers that I could read upon and try to speed up further?I
> had tried a bit of block tiling but was not successful.
>
> Thanks
> Arun

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to