[julia-users] Aren't loops supposed to be faster?

2014-12-11 Thread Petr Krysl
Acting upon the advice that replacing matrix-matrix multiplications in 
vectorized form with loops would help with performance, I chopped out a 
piece of code from my finite element solver 
(https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some tests 
with the following results:

Vectorized code:
elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc 
time)

Loops code:
elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc 
time) 

SLOWER and using MORE memory?!

I must be doing something terribly wrong.

Petr



Re: [julia-users] Aren't loops supposed to be faster?

2014-12-11 Thread Andreas Noack
See the comment in the gist.

2014-12-11 11:47 GMT-05:00 Petr Krysl krysl.p...@gmail.com:

 Acting upon the advice that replacing matrix-matrix multiplications in
 vectorized form with loops would help with performance, I chopped out a
 piece of code from my finite element solver (
 https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some
 tests with the following results:

 Vectorized code:
 elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc
 time)

 Loops code:
 elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc
 time)

 SLOWER and using MORE memory?!

 I must be doing something terribly wrong.

 Petr




Re: [julia-users] Aren't loops supposed to be faster?

2014-12-11 Thread Petr Krysl
Dear Andreas,

Thank you very much. True, I have not noticed that. I put the definitions 
of the arrays outside of the two functions so that their results could be 
compared.

What I'm trying to do here is write a simple chunk of code that would 
reproduce the conditions in the FE package.
There the vectorized code and the loops only see local variables, declared 
above the major loop.  So in my opinion the conditions then are the same as 
in the corrected fragment from the gist (only local variables).

Now I can see that the fragment for some reason did not reproduce the 
conditions from the full code.  Indeed, as you predicted the loop 
implementation is almost 10 times faster than the vectorized version. 
 However, in the FE code the loops run twice as slow and consume more 
memory.

Just in case you, Andreas, or anyone else are curious,  here is the full FE 
code that displays the weird behavior of loops being slower than vectorized 
code.
https://gist.github.com/PetrKryslUCSD/ae4a0f218fe50abe370f

Thanks again,

Petr

On Thursday, December 11, 2014 9:02:00 AM UTC-8, Andreas Noack wrote:

 See the comment in the gist.

 2014-12-11 11:47 GMT-05:00 Petr Krysl krysl...@gmail.com javascript::

 Acting upon the advice that replacing matrix-matrix multiplications in 
 vectorized form with loops would help with performance, I chopped out a 
 piece of code from my finite element solver (
 https://gist.github.com/anonymous/4ec426096c02faa4354d) and ran some 
 tests with the following results:

 Vectorized code:
 elapsed time: 0.326802682 seconds (134490340 bytes allocated, 17.06% gc 
 time)

 Loops code:
 elapsed time: 4.681451441 seconds (997454276 bytes allocated, 9.05% gc 
 time) 

 SLOWER and using MORE memory?!

 I must be doing something terribly wrong.

 Petr




Re: [julia-users] Aren't loops supposed to be faster?

2014-12-11 Thread Petr Krysl
One more note: I conjectured that perhaps the compiler was not able to 
infer correctly the type of the matrices,  so I hardwired (in the actual FE 
code)

Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment

About 10% less memory used, runtime about the same.  So, no effect really. 
Loops are still slower than the vectorized code by a factor of two.

Petr




Re: [julia-users] Aren't loops supposed to be faster?

2014-12-11 Thread Peter Simon
One thing I noticed after a quick glance:  The ordering of your nested 
loops is very cache-unfriendly.  Julia stores arrays in column-major order 
(same as Fortran) so that nested loops should arrange that the first 
subscripts of multidimensional arrays are varied most rapidly.

--Peter

On Thursday, December 11, 2014 9:47:33 AM UTC-8, Petr Krysl wrote:

 One more note: I conjectured that perhaps the compiler was not able to 
 infer correctly the type of the matrices,  so I hardwired (in the actual FE 
 code)

 Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment

 About 10% less memory used, runtime about the same.  So, no effect really. 
 Loops are still slower than the vectorized code by a factor of two.

 Petr




Re: [julia-users] Aren't loops supposed to be faster?

2014-12-11 Thread Petr Krysl
I experimented with it a little bit before (mx innermost loop): does not 
make a difference.

On Thursday, December 11, 2014 9:55:46 AM UTC-8, Peter Simon wrote:

 One thing I noticed after a quick glance:  The ordering of your nested 
 loops is very cache-unfriendly.  Julia stores arrays in column-major order 
 (same as Fortran) so that nested loops should arrange that the first 
 subscripts of multidimensional arrays are varied most rapidly.

 --Peter

 On Thursday, December 11, 2014 9:47:33 AM UTC-8, Petr Krysl wrote:

 One more note: I conjectured that perhaps the compiler was not able to 
 infer correctly the type of the matrices,  so I hardwired (in the actual FE 
 code)

 Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment

 About 10% less memory used, runtime about the same.  So, no effect 
 really. Loops are still slower than the vectorized code by a factor of two.

 Petr