Library routine switching based on -march

Thomas Koenig Tue, 15 Nov 2016 03:39:04 -0800

Hi,

I ran some testing on the soon-to-be-committed matmul patch.
Specifically, I tried out what putting -march=native into
libgfortran's Makefile.


Here is the performance data with the new code without -march.
The interesting numbers are the ones for Matml fixed explicit,
for size>=32.

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  5000      0.067      0.079      0.053      0.064
    4  5000      0.440      0.444      0.364      0.434
    8  5000      1.405      1.152      1.368      1.495
   16  5000      2.805      1.885      3.172      3.444
   32  5000      4.943      3.627      7.267      7.510
   64  5000      9.037      4.028      9.036      9.157
  128  3829     10.181      4.452      9.932     10.333
  256   477     10.398      4.720     10.919     11.158
  512    59     11.173      4.853     11.172     11.356
 1024     7     11.074      3.616     11.075     11.266

With -march=native:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  5000      0.064      0.080      0.051      0.064
    4  5000      0.406      0.450      0.347      0.407
    8  5000      1.342      1.124      1.364      1.437
   16  5000      2.989      1.865      3.427      3.760
   32  5000      5.543      3.481      8.203      8.700
   64  5000     11.632      4.021     11.647     11.729
  128  3829     13.968      4.372     13.966     14.046
  256   477     15.778      4.717     15.780     15.761
  512    59     16.102      4.855     16.075     16.109
 1024     7     15.867      3.596     15.884     15.886

So, there could be quite some gain in performance if this
could be exploited; even more for architectures like AVX-512,
I suspect.

Do you think this is worth pursuing?  If so, how could/should this
be implemented=  Does anybody do this kind of thing in gcc yet?

Regards

        Thomas

Library routine switching based on -march

Reply via email to