https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jvdelisle at gcc dot gnu.org

--- Comment #16 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
For what its worth:

$ gfc pr51119.f90 -lblas -fno-external-blas -Ofast -march=native 
$ ./a.out 
 Time, MATMUL:    21.2483196       21.254449646000001     1.5055670945599979    
 Time, dgemm:    33.2441711       33.243087289000002      .96260614189671445    

This is on a laptop not taking any advantage of a tuned BLAS.  If I replace
-Ofast with -O2 I get:

$ ./a.out 
 Time, MATMUL:    43.6199570       43.625358022999997    0.73351833543988521    
 Time, dgemm:    33.2262650       33.226961453000001     0.96307331759072967 

-O3 brings performance back to match with -Ofast. It seems odd to me that -O2
does not do well.

Regardless, the internal MATMUL is doing better than BLAS on this platform, but
1.5 gflops is pretty lame either way.

Reply via email to