@Arun: I'm assuming that you are optimizing for large matrices. Try
making your basic operation a 4x4 matrix multiplication, written out
(i.e., without loops), where the 4x4 matrices are subarrays of the
operand arrays. This should give you good cache utilization and data
reuse, as each element of the two 4x4 operand matrices will be used 4
times. You will have to clean up around the edges if the matrix
dimensions are not multiples of 4.

Dave

On Feb 27, 5:57 pm, Arun Vishwanathan <aaron.nar...@gmail.com> wrote:
> Hi all,
>
> We have this challenge to make the fastest executing serial matrix
> multiplication code. I have tried using matrix transpose( in C for row
> major ) and loop unrolling.I was able to obtain little speedup. Does anyone
> have any hints/papers that I could read upon and try to speed up further?I
> had tried a bit of block tiling but was not successful.
>
> Thanks
> Arun

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to