On Thu, Mar 18, 2021 at 04:05:40PM +0100, Richard Biener wrote:
> On Thu, Mar 18, 2021 at 3:48 PM Tobias Burnus <tob...@codesourcery.com> wrote:
> >
> > Richard,
> >
> > On 18.03.21 13:35, Richard Biener via Fortran wrote:
> > > [...]
> > > Since the libgfortran MATMUL should be vectorized
> > > I think it's not reasonable to inline any but _very_ small
> > > MATMUL at optimization levels that do not enable vectorization.
> >
> > Besides the obvious if (!flag_external_blas) which should always prevent
> > inlining (possibly except for tiny N like N=1), your idea is 'if (N
> > small || flag_tree_loop_vectorize)'?
> >
> > Or are you thinking of a different or additional flag_... than
> > flag_tree_loop_vectorize for making this choice?
> 
> Yes, I was thinking of flag_tree_loop_vectorize.  Of course libgfortran
> is far from having micro-optimized matmul for various architectures
> but IIRC it uses attribute(target) to provide several overloads.  So
> maybe only ever inlining tiny matmul makes sense as well (does the
> runtime have specializations for small sizes?)
> 

With -fexternal-blas, there is a cross-over value of N=30,
which can be changed by -fblas-matmul-limit=N option.

I forgot the important example, but Thomas seems to be aware.

% gfcx -o z -O2 -fno-frontend-optimize -fexternal-blas a.f90 && ./z
/usr/local/bin/ld: /tmp/ccOe3VoD.o: in function `MAIN__':
a.f90:(.text+0x156): undefined reference to `sgemm_'
collect2: error: ld returned 1 exit status

sgemm_ would come from a tuned BLAS library such as OpenBLAS.

I was going to suggest adding a testcase that scans a dump
for sgemm.  It seems matmul_blas_1.f tests the -fexternal-blas
and -fblas-matmul-limit=N options, but it doesn't look for sgemm.  
This, I believe, does the checking

diff --git a/gcc/testsuite/gfortran.dg/matmul_blas_1.f 
b/gcc/testsuite/gfortran.dg/matmul_blas_1.f
index 6a88981c9d7..52298d09cce 100644
--- a/gcc/testsuite/gfortran.dg/matmul_blas_1.f
+++ b/gcc/testsuite/gfortran.dg/matmul_blas_1.f
@@ -237,4 +237,4 @@ C Test calling of BLAS routines
       if (any (c /= cres)) stop 20
 
       end
-! { dg-final { scan-tree-dump-times "_gfortran_matmul" 0 "optimized" } }
+! { dg-final { scan-tree-dump "sgemm" "optimized" } }

-- 
Steve

Reply via email to