Hi Petr, I just tried the devectorized problem, although I did choose to go a bit of a different route: https://gist.github.com/rleegates/2d99e6251fe246b017ac I am not sure that this is what you intended, however, using the vectorized code as a reference, I do obtain the same results up to machine epsilon.
Anyways, I got: In [4]: keTest(200_000) Vectorized: elapsed time: 0.426404203 seconds (140804768 bytes allocated, 22.42% gc time) DeVectorized: elapsed time: 0.078519349 seconds (128 bytes allocated) DeVectorized InBounds: elapsed time: 0.032812311 seconds (128 bytes allocated) Error norm deVec: 0.0 Error norm inBnd: 0.0 On Thursday, December 11, 2014 5:47:01 PM UTC+1, Petr Krysl wrote: > > Acting upon the advice that replacing matrix-matrix multiplications in > vectorized form with loops would help with performance, I chopped out a > piece of code from my finite element solver ( > https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some > tests with the following results: > > Vectorized code: > elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc > time) > > Loops code: > elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc > time) > > SLOWER and using MORE memory?! > > I must be doing something terribly wrong. > > Petr > >