[julia-users] Aren't loops supposed to be faster?
Acting upon the advice that replacing matrix-matrix multiplications in vectorized form with loops would help with performance, I chopped out a piece of code from my finite element solver (https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some tests with the following results: Vectorized code: elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc time) Loops code: elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc time) SLOWER and using MORE memory?! I must be doing something terribly wrong. Petr
Re: [julia-users] Aren't loops supposed to be faster?
See the comment in the gist. 2014-12-11 11:47 GMT-05:00 Petr Krysl krysl.p...@gmail.com: Acting upon the advice that replacing matrix-matrix multiplications in vectorized form with loops would help with performance, I chopped out a piece of code from my finite element solver ( https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some tests with the following results: Vectorized code: elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc time) Loops code: elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc time) SLOWER and using MORE memory?! I must be doing something terribly wrong. Petr
Re: [julia-users] Aren't loops supposed to be faster?
Dear Andreas, Thank you very much. True, I have not noticed that. I put the definitions of the arrays outside of the two functions so that their results could be compared. What I'm trying to do here is write a simple chunk of code that would reproduce the conditions in the FE package. There the vectorized code and the loops only see local variables, declared above the major loop. So in my opinion the conditions then are the same as in the corrected fragment from the gist (only local variables). Now I can see that the fragment for some reason did not reproduce the conditions from the full code. Indeed, as you predicted the loop implementation is almost 10 times faster than the vectorized version. However, in the FE code the loops run twice as slow and consume more memory. Just in case you, Andreas, or anyone else are curious, here is the full FE code that displays the weird behavior of loops being slower than vectorized code. https://gist.github.com/PetrKryslUCSD/ae4a0f218fe50abe370f Thanks again, Petr On Thursday, December 11, 2014 9:02:00 AM UTC-8, Andreas Noack wrote: See the comment in the gist. 2014-12-11 11:47 GMT-05:00 Petr Krysl krysl...@gmail.com javascript:: Acting upon the advice that replacing matrix-matrix multiplications in vectorized form with loops would help with performance, I chopped out a piece of code from my finite element solver ( https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some tests with the following results: Vectorized code: elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc time) Loops code: elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc time) SLOWER and using MORE memory?! I must be doing something terribly wrong. Petr
Re: [julia-users] Aren't loops supposed to be faster?
One more note: I conjectured that perhaps the compiler was not able to infer correctly the type of the matrices, so I hardwired (in the actual FE code) Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment About 10% less memory used, runtime about the same. So, no effect really. Loops are still slower than the vectorized code by a factor of two. Petr
Re: [julia-users] Aren't loops supposed to be faster?
One thing I noticed after a quick glance: The ordering of your nested loops is very cache-unfriendly. Julia stores arrays in column-major order (same as Fortran) so that nested loops should arrange that the first subscripts of multidimensional arrays are varied most rapidly. --Peter On Thursday, December 11, 2014 9:47:33 AM UTC-8, Petr Krysl wrote: One more note: I conjectured that perhaps the compiler was not able to infer correctly the type of the matrices, so I hardwired (in the actual FE code) Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment About 10% less memory used, runtime about the same. So, no effect really. Loops are still slower than the vectorized code by a factor of two. Petr
Re: [julia-users] Aren't loops supposed to be faster?
I experimented with it a little bit before (mx innermost loop): does not make a difference. On Thursday, December 11, 2014 9:55:46 AM UTC-8, Peter Simon wrote: One thing I noticed after a quick glance: The ordering of your nested loops is very cache-unfriendly. Julia stores arrays in column-major order (same as Fortran) so that nested loops should arrange that the first subscripts of multidimensional arrays are varied most rapidly. --Peter On Thursday, December 11, 2014 9:47:33 AM UTC-8, Petr Krysl wrote: One more note: I conjectured that perhaps the compiler was not able to infer correctly the type of the matrices, so I hardwired (in the actual FE code) Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment About 10% less memory used, runtime about the same. So, no effect really. Loops are still slower than the vectorized code by a factor of two. Petr