I experimented with it a little bit before (mx innermost loop): does not make a difference.
On Thursday, December 11, 2014 9:55:46 AM UTC-8, Peter Simon wrote: > > One thing I noticed after a quick glance: The ordering of your nested > loops is very cache-unfriendly. Julia stores arrays in column-major order > (same as Fortran) so that nested loops should arrange that the first > subscripts of multidimensional arrays are varied most rapidly. > > --Peter > > On Thursday, December 11, 2014 9:47:33 AM UTC-8, Petr Krysl wrote: >> >> One more note: I conjectured that perhaps the compiler was not able to >> infer correctly the type of the matrices, so I hardwired (in the actual FE >> code) >> >> Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment >> >> About 10% less memory used, runtime about the same. So, no effect >> really. Loops are still slower than the vectorized code by a factor of two. >> >> Petr >> >> >>