You can get an initial guess with math.  The rest you'll need to
determine by experiment.  The math will be

Block Dimension = sqrt( L1 / sizeof(FLOAT) / K )  .

K could be 2 or 3 depending on the loop order.  A reason you can't
predict perfectly is that interrupt handlers can add to cache load in
unpredictable ways. You might find it interesting to turn off your
network and see if you can get further improvement by making blocks a
little bigger than you could with the network turned on.


On Mar 5, 2:08 pm, Arun Vishwanathan <aaron.nar...@gmail.com> wrote:
> @Gene: if all matrices are of N x N , and my size of L1 cache is L1, then
> If I need a block of A and B to fit in cache would I need it as 2 x
> (blocksize )^2 x 8 ( bytes per data item-for example) = L1 ?? or would I
> need it as 3 ( including C block ) x (blocksize)^2 x 8 = L1 ? Am confused.
> I tried a sample and I think I got somewhat good speedup for block size 32
> ( for matrix dimension 512, 1024 etc )for my L1 of size 16 kbytes and L2
> 256 kbytes...Any comments or inferences?
>
> On Wed, Feb 29, 2012 at 9:31 AM, Arun Vishwanathan
> <aaron.nar...@gmail.com>wrote:
>
>
>
>
>
>
>
>
>
> > @all: Thanks a lot
>
> > On Wed, Feb 29, 2012 at 9:02 AM, Gene <gene.ress...@gmail.com> wrote:
>
> >> For big matrices, using all the caches well is very important.  The
> >> usual approach is block tiling as you say.  You want two blocks to fit
> >> nicely in the L1 cache.  The most highly optimized schemes will have a
> >> hierarchy of tiles where two tiles at the second level will fit in the
> >> L2 cache, etc. Additional levels can be based on memory interfaces,
> >> interleaving, page size, and even cylinder size on disk (for really
> >> huge matrices). The idea is _never_ to generate more cache misses than
> >> you need to. A miss causes a factor of 10 to 10000 performance
> >> multiple on that operation.
>
> >> Multiplying within the innermost blocks should also consider prefetch:
> >> you'll get best performance touching locations in contiguous ascending
> >> order.  The inner loops are
>
> >> for i = 1 to ma
> >>  for j = 1 to nb
> >>    for k = 1 to na
> >>      r[i,j] += a[i,k] * b[k,j]
>
> >> This ignores that r[i,j] needs to be zeroed out somewhere.  But with
> >> this assumption, the loops can be juxtaposed in any order without
> >> changing the result. You should explore the 6 possible orderings for
> >> the best one.  Of course you also have to zero out the sums in the
> >> best possible manner.
>
> >> FWIW, a GPU will normally outperform a general purpose CPU with ease
> >> on this problem.  Since even cell phones are getting GPUs these days,
> >> tweaking single-processor matrix code is a dying art.
>
> >> Cheers.
>
> >> On Feb 27, 6:57 pm, Arun Vishwanathan <aaron.nar...@gmail.com> wrote:
> >> > Hi all,
>
> >> > We have this challenge to make the fastest executing serial matrix
> >> > multiplication code. I have tried using matrix transpose( in C for row
> >> > major ) and loop unrolling.I was able to obtain little speedup. Does
> >> anyone
> >> > have any hints/papers that I could read upon and try to speed up
> >> further?I
> >> > had tried a bit of block tiling but was not successful.
>
> >> > Thanks
> >> > Arun
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Algorithm Geeks" group.
> >> To post to this group, send email to algogeeks@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> algogeeks+unsubscr...@googlegroups.com.
> >> For more options, visit this group at
> >>http://groups.google.com/group/algogeeks?hl=en.
>
> > --
> >  "People often say that motivation doesn't last. Well, neither does
> > bathing - that's why we recommend it daily."
>
> --
>  "People often say that motivation doesn't last. Well, neither does bathing
> - that's why we recommend it daily."

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to