https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #14 from Janne Blomqvist <jb at gcc dot gnu.org> ---
(In reply to Dominique d'Humieres from comment #12)
> I suppose most modern OS provide such optimized BLAS and, if not, one can
> install libraries such as atlas. So I wonder if it would not be more
> effective to be able to configure with something such as --with-blas="magic
> incantation" and use -fexternal-blas as the default rather than reinventing
> the wheel.

This matches my current thinking on this subject. 

To get good performance one really needs arch-specific parameters (block sizes
to fit into cache etc.), as well as using arch-specific code to make maximum
use of the vector ISA. Add in threading which is useful for larger matrices,
and there's lot more work than what the current GFortran development team is
able to commit to.

So my idea of what ought to be done:

- Check for the presence of BLAS at compile time. Alternatively, use weak
references so we can always use BLAS if it's available, without the user having
to specify -fexternal-blas (which I guess most user don't).

  - A problem here is what if the system has multiple BLAS libraries, which one
do we choose? And different systems have different ways of linking to BLAS
(e.g. -framework Accelerate on OSX).

  - And what about BLAS64, i.e. BLAS compiler with 64-bit integers. It seems
these libraries have the same API as the "normal" BLAS, so how to figure out at
build time which kind of BLAS library are we using?

- Currently with -fexternal-blas we only use BLAS for stride-1 arrays, falling
back to the current code for stride /= 1. It's probably more efficient to pack
stride /= 1 arrays and then call BLAS. Heck, high performance BLAS libraries
repack blocks to get better cache behavior anyways.


> 
> More than three years ago Janne Blomqvist (comment 7) wrote
> > IIRC I reached about 30-40 % of peak flops which was a bit disappointing.
> 
> Would it be possible to have the patch to play with?

My GCC dev box where I think this stuff might reside is packed down in a box as
I have recently moved. But I'll keep this in mind, and see if I can find the
patch once I get around to unpacking..

As an aside, contrary to when I implemented my patch based on reading the
papers by Goto et al., nowadays there's a nice step-by-step description at

http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/index.html

Reply via email to