https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79930

--- Comment #6 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Thanks Thomas, somehow I thought we would have built the temporary to do this.
(Well actully we do, but after the frontend passes)

Now we get:

$ gfc -O2 tp_array.f90 
$ time ./a.out 
 This code variant uses intrinsic arrays to represent the contents of
Type(Vect3D).
 Random Numbers, time:     43.6485367    
 Using SUM, time:          2.20666122    
 Using MATMUL (L), time:   1.58225632    
 Using MATMUL (R), time:   7.54129410 

Where the LEFT case I did this:

  type(Vect3D) pure function TP_LEFT(NU, D, NV) result(tensorproduct)
    real(dp),     intent(in) :: NU(4), NV(4)
    real(dp)                 :: tmp(4)
    type(Vect3D), intent(in) :: D(4,4)

    tmp = matmul(NU, D%vec(1))
    tensorproduct%vec(1) = dot_product(tmp, NV) ! "left"
    tmp = matmul(NU, D%vec(2))
    tensorproduct%vec(2) = dot_product(tmp, NV)
    tmp = matmul(NU, D%vec(2))
    tensorproduct%vec(3) = dot_product(tmp, NV) ! gives more expected results
  end function

and just for grins:

$ gfc -Ofast -march=native -ftree-vectorize tp_array.f90 
$ time ./a.out 
 This code variant uses intrinsic arrays to represent the contents of
Type(Vect3D).
 Random Numbers, time:     42.7615433    
 Using SUM, time:         0.741546631    
 Using MATMUL (L), time:  0.522426605    
 Using MATMUL (R), time:   6.76409149    

real    0m51.331s
user    0m50.389s
sys     0m0.501s

So we need to be careful how we use the tool to get the most out of the
optimizers.

Reply via email to