@Arun: I'm assuming that you are optimizing for large matrices. Try making your basic operation a 4x4 matrix multiplication, written out (i.e., without loops), where the 4x4 matrices are subarrays of the operand arrays. This should give you good cache utilization and data reuse, as each element of the two 4x4 operand matrices will be used 4 times. You will have to clean up around the edges if the matrix dimensions are not multiples of 4.
Dave On Feb 27, 5:57 pm, Arun Vishwanathan <aaron.nar...@gmail.com> wrote: > Hi all, > > We have this challenge to make the fastest executing serial matrix > multiplication code. I have tried using matrix transpose( in C for row > major ) and loop unrolling.I was able to obtain little speedup. Does anyone > have any hints/papers that I could read upon and try to speed up further?I > had tried a bit of block tiling but was not successful. > > Thanks > Arun -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.