In any case, this does make me wonder what is going on under the hood... I would not call the vectorized code "vectorized". IMHO, this should just pass to BLAS without overhead. Something appears to be creating a bunch of temporaries.
On Thursday, December 11, 2014 5:47:01 PM UTC+1, Petr Krysl wrote: > > Acting upon the advice that replacing matrix-matrix multiplications in > vectorized form with loops would help with performance, I chopped out a > piece of code from my finite element solver ( > https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some > tests with the following results: > > Vectorized code: > elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc > time) > > Loops code: > elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc > time) > > SLOWER and using MORE memory?! > > I must be doing something terribly wrong. > > Petr > >