On Thu, Mar 18, 2021 at 04:05:40PM +0100, Richard Biener wrote: > On Thu, Mar 18, 2021 at 3:48 PM Tobias Burnus <tob...@codesourcery.com> wrote: > > > > Richard, > > > > On 18.03.21 13:35, Richard Biener via Fortran wrote: > > > [...] > > > Since the libgfortran MATMUL should be vectorized > > > I think it's not reasonable to inline any but _very_ small > > > MATMUL at optimization levels that do not enable vectorization. > > > > Besides the obvious if (!flag_external_blas) which should always prevent > > inlining (possibly except for tiny N like N=1), your idea is 'if (N > > small || flag_tree_loop_vectorize)'? > > > > Or are you thinking of a different or additional flag_... than > > flag_tree_loop_vectorize for making this choice? > > Yes, I was thinking of flag_tree_loop_vectorize. Of course libgfortran > is far from having micro-optimized matmul for various architectures > but IIRC it uses attribute(target) to provide several overloads. So > maybe only ever inlining tiny matmul makes sense as well (does the > runtime have specializations for small sizes?) >
With -fexternal-blas, there is a cross-over value of N=30, which can be changed by -fblas-matmul-limit=N option. I forgot the important example, but Thomas seems to be aware. % gfcx -o z -O2 -fno-frontend-optimize -fexternal-blas a.f90 && ./z /usr/local/bin/ld: /tmp/ccOe3VoD.o: in function `MAIN__': a.f90:(.text+0x156): undefined reference to `sgemm_' collect2: error: ld returned 1 exit status sgemm_ would come from a tuned BLAS library such as OpenBLAS. I was going to suggest adding a testcase that scans a dump for sgemm. It seems matmul_blas_1.f tests the -fexternal-blas and -fblas-matmul-limit=N options, but it doesn't look for sgemm. This, I believe, does the checking diff --git a/gcc/testsuite/gfortran.dg/matmul_blas_1.f b/gcc/testsuite/gfortran.dg/matmul_blas_1.f index 6a88981c9d7..52298d09cce 100644 --- a/gcc/testsuite/gfortran.dg/matmul_blas_1.f +++ b/gcc/testsuite/gfortran.dg/matmul_blas_1.f @@ -237,4 +237,4 @@ C Test calling of BLAS routines if (any (c /= cres)) stop 20 end -! { dg-final { scan-tree-dump-times "_gfortran_matmul" 0 "optimized" } } +! { dg-final { scan-tree-dump "sgemm" "optimized" } } -- Steve