https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119

--- Comment #32 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> ---
Created attachment 39985
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39985&action=edit
Proposed patch to get testing going

This patch works pretty good for me. My results are as follows:

gfortran version 6:

$ gfc6 -static -O2 -finline-matmul-limit=0 compare.f90 
[jerry@quasar pr51119]$ ./a.out 
 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.086      0.054      0.060      0.098
    4  2000      0.288      0.302      0.256      0.315
    8  2000      0.799      0.830      2.094      2.246
   16  2000      4.045      2.539      4.198      4.266
   32  2000      5.358      2.301      5.340      5.335
   64  2000      5.411      2.207      5.391      5.395
  128  2000      5.918      2.416      5.919      5.915
  256   477      5.871      2.393      5.870      5.869
  512    59      2.927      1.891      2.927      2.928
 1024     7      1.668      1.182      1.667      1.668
 2048     1      1.763      1.526      1.763      1.763

gfortran version 7:

$ gfc -static -O2 -finline-matmul-limit=0 compare.f90 
[jerry@quasar pr51119]$ ./a.out 
 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.053      0.052      0.043      0.054
    4  2000      0.310      0.304      0.277      0.377
    8  2000      0.704      0.858      1.711      1.758
   16  2000      2.805      2.529      2.798      2.780
   32  2000      4.693      2.210      4.700      4.821
   64  2000      6.768      2.038      6.732      6.782
  128  2000      8.550      2.419      8.647      8.595
  256   477      9.442      2.378      9.425      9.446
  512    59      8.565      1.960      8.641      8.568
 1024     7      8.537      1.178      8.610      8.530
 2048     1      8.576      1.512      8.652      8.582

A portion of the speed up is from using:

#pragma GCC optimize ( "-Ofast" ) which I just discovered. I am thinking
addition and subtraction are fairly safe with this option, however I do not
know if it is acceptable for release since it may contradict somewhere on some
platform or even a gcc policy. But hey it workd for me.

Much testing needed. There is a nice sweet spot at 256. This is on a single
thread on 3.8 GHz core.

Reply via email to