Dear Andreas, Thank you very much. True, I have not noticed that. I put the definitions of the arrays outside of the two functions so that their results could be compared.
What I'm trying to do here is write a simple chunk of code that would reproduce the conditions in the FE package. There the vectorized code and the loops only see local variables, declared above the major loop. So in my opinion the conditions then are the same as in the corrected fragment from the gist (only local variables). Now I can see that the fragment for some reason did not reproduce the conditions from the full code. Indeed, as you predicted the loop implementation is almost 10 times faster than the vectorized version. However, in the FE code the loops run twice as slow and consume more memory. Just in case you, Andreas, or anyone else are curious, here is the full FE code that displays the weird behavior of loops being slower than vectorized code. https://gist.github.com/PetrKryslUCSD/ae4a0f218fe50abe370f Thanks again, Petr On Thursday, December 11, 2014 9:02:00 AM UTC-8, Andreas Noack wrote: > > See the comment in the gist. > > 2014-12-11 11:47 GMT-05:00 Petr Krysl <krysl...@gmail.com <javascript:>>: > >> Acting upon the advice that replacing matrix-matrix multiplications in >> vectorized form with loops would help with performance, I chopped out a >> piece of code from my finite element solver ( >> https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some >> tests with the following results: >> >> Vectorized code: >> elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc >> time) >> >> Loops code: >> elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc >> time) >> >> SLOWER and using MORE memory?! >> >> I must be doing something terribly wrong. >> >> Petr >> >> >