https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82471

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2017-10-08
     Ever confirmed|0                           |1

--- Comment #3 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> I know that
> DO CONCURRENT( I=1:L, J=1:M, K=1:N)
> is the fastest

 DO CONCURRENT :    8.19799995    
 DO CONCURRENT :   0.284000009    
 ORDINARY DO   :   0.116000004    
 ARRAY DO      :   0.118000008    

Note that the "right" ordered DO CONCURRENT is more than two time slower than
the ORDINARY DO or the ARRAY DO (C=A+B).

> but I expected that do-concurrent work like ordinary-do by varying
> the last index in nested loops.

Why? I am rather expecting the left-most index for the inner loop, the second
for the first outer loop and so on (latin ordering).

Note also that the optimization you are expecting should be done in the
middle-end. Unfortunately "loop flattening" (the best optimization here) is not
done by the gcc middle-end (pr82450).

For more complex optimization, such as the matrix multiplication in
https://groups.google.com/forum/#!topic/comp.lang.fortran/jljio5HfSQc, this
requires exchanging loop order, which is not (well) handled by the gcc
middle-end (pr61000).

Also some loop interchanges require a cost model

do j = 1, N
  do i = 1, L
    do j = 1, M
      c(i,j) = c(i,j) + a(i,k)*b(k,j)
    end do
  end do
end do

is faster on modern CPUs than

do j = 1, N
  do i = 1, M
    do k = 1, L
      c(i,j) = c(i,j) + a(i,k)*b(k,j)
    end do
  end do
end do

but to reach this conclusion a priori requires detailed knowledge of the CPU
and the memory system (while the exchange of i and j can never be profitable).

Reply via email to