https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930
--- Comment #6 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> --- Thanks Thomas, somehow I thought we would have built the temporary to do this. (Well actully we do, but after the frontend passes) Now we get: $ gfc -O2 tp_array.f90 $ time ./a.out This code variant uses intrinsic arrays to represent the contents of Type(Vect3D). Random Numbers, time: 43.6485367 Using SUM, time: 2.20666122 Using MATMUL (L), time: 1.58225632 Using MATMUL (R), time: 7.54129410 Where the LEFT case I did this: type(Vect3D) pure function TP_LEFT(NU, D, NV) result(tensorproduct) real(dp), intent(in) :: NU(4), NV(4) real(dp) :: tmp(4) type(Vect3D), intent(in) :: D(4,4) tmp = matmul(NU, D%vec(1)) tensorproduct%vec(1) = dot_product(tmp, NV) ! "left" tmp = matmul(NU, D%vec(2)) tensorproduct%vec(2) = dot_product(tmp, NV) tmp = matmul(NU, D%vec(2)) tensorproduct%vec(3) = dot_product(tmp, NV) ! gives more expected results end function and just for grins: $ gfc -Ofast -march=native -ftree-vectorize tp_array.f90 $ time ./a.out This code variant uses intrinsic arrays to represent the contents of Type(Vect3D). Random Numbers, time: 42.7615433 Using SUM, time: 0.741546631 Using MATMUL (L), time: 0.522426605 Using MATMUL (R), time: 6.76409149 real 0m51.331s user 0m50.389s sys 0m0.501s So we need to be careful how we use the tool to get the most out of the optimizers.