https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930

--- Comment #5 from Adam Hirst <adam at aphirst dot karoo.co.uk> ---
Hmm, even with -Ofast, I don't get any noticeable performance increase if I
change, say, TP_LEFT, to be:

  type(Vect3D) pure function TP_LEFT(NU, D, NV) result(tensorproduct)
    real(dp),     intent(in) :: NU(4), NV(4)
    type(Vect3D), intent(in) :: D(4,4)
    real(dp)                 :: Dx(4,4), Dy(4,4), Dz(4,4)

    Dx = D%x
    Dy = D%y
    Dz = D%z
    tensorproduct%x = dot_product(matmul(NU, Dx),NV)
    tensorproduct%y = dot_product(matmul(NU, Dy),NV)
    tensorproduct%z = dot_product(matmul(NU, Dz),NV)
  end function

Perhaps you meant to introduce the explicit temporaries at a different level,
or there's another flag I need.

It's worth maybe noting, though, that -Ofast makes the "explicit DO"
implementation EVEN faster, so I'll in the meantime definitely investigate
reintroducing -Ofast to my real codebase.

Reply via email to