Robert, This is very nice. Basically it confirms that if every single variable is properly declared and the compiler can make all its optimizations, then the loops have a chance of working.
I got a bit lost in the follow-up discussion: I think the message chain might have been broken. Petr On Thursday, December 11, 2014 2:05:40 PM UTC-8, Robert Gates wrote: > > Hi Petr, > > I just tried the devectorized problem, although I did choose to go a bit > of a different route: > https://gist.github.com/rleegates/2d99e6251fe246b017ac > I am not sure that this is what you intended, however, using the > vectorized code as a reference, I do obtain the same results up to machine > epsilon. > > Anyways, I got: > > In [4]: keTest(200_000) > Vectorized: > elapsed time: 0.426404203 seconds (140804768 bytes allocated, 22.42% gc > time) > DeVectorized: > elapsed time: 0.078519349 seconds (128 bytes allocated) > DeVectorized InBounds: > elapsed time: 0.032812311 seconds (128 bytes allocated) > Error norm deVec: 0.0 > Error norm inBnd: 0.0 > > On Thursday, December 11, 2014 5:47:01 PM UTC+1, Petr Krysl wrote: >> >> Acting upon the advice that replacing matrix-matrix multiplications in >> vectorized form with loops would help with performance, I chopped out a >> piece of code from my finite element solver ( >> https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some >> tests with the following results: >> >> Vectorized code: >> elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc >> time) >> >> Loops code: >> elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc >> time) >> >> SLOWER and using MORE memory?! >> >> I must be doing something terribly wrong. >> >> Petr >> >>