One more note: I conjectured that perhaps the compiler was not able to 
infer correctly the type of the matrices,  so I hardwired (in the actual FE 
code)

Jac = 1.0; gradN = gradNparams[j]/(J); # get rid of Rm for the moment

About 10% less memory used, runtime about the same.  So, no effect really. 
Loops are still slower than the vectorized code by a factor of two.

Petr


Reply via email to