Look more carefully into the pattern of your codes. It seems that you may be able to reshape v1 into a matrix of size (m, n) and v0 into a matrix of size (n, m), and you may do a matrix-matrix multiplication once without looping over multiple strided vectors.
Also, if you are using ArrayViews, you can write view(v0, n*(i-1)+1:n*i) instead of v0[(n * (i-1)) + (1:n)]. Dahua On Tuesday, July 29, 2014 4:22:32 PM UTC-5, Florian Oswald wrote: > > Hi all, > > I've got an algorithm that hinges critically on fast matrix > multiplication. I put up the function on this gist > > https://gist.github.com/floswald/6dea493417912536688d#file-tensor-jl-L45 > > indicating the line (45) that takes most of the time, as you can see in > the profile output that is there as well. I am trying to figure out if I'm > doing something wrong here or if that line just takes as long as it takes. > I have to do this many times, so if this takes too long I have to change my > strategy. > > The core of the problem looks like that > > for imat in 2:nbm > v0 = copy(v1) > stemp = ibm[ks[imat]] > n = size(stemp,1) > m = nall / n > for i in 1:m > v1[m*(0:(n-1)) + i] = stemp * v0[(n*(i-1)) + (1:n)] > end > end > > > where v are vectors and stemp is a matrix. I spend a lot of time in the > matrix multiplication line on the innermost loop. Any suggestions would be > much appreciated. Thanks! > > >