Hi,
I ran some testing on the soon-to-be-committed matmul patch.
Specifically, I tried out what putting -march=native into
libgfortran's Makefile.
Here is the performance data with the new code without -march.
The interesting numbers are the ones for Matml fixed explicit,
for size>=32.
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 0.067 0.079 0.053 0.064
4 5000 0.440 0.444 0.364 0.434
8 5000 1.405 1.152 1.368 1.495
16 5000 2.805 1.885 3.172 3.444
32 5000 4.943 3.627 7.267 7.510
64 5000 9.037 4.028 9.036 9.157
128 3829 10.181 4.452 9.932 10.333
256 477 10.398 4.720 10.919 11.158
512 59 11.173 4.853 11.172 11.356
1024 7 11.074 3.616 11.075 11.266
With -march=native:
=========================================================
================ MEASURED GIGAFLOPS =
=========================================================
Matmul Matmul
fixed Matmul variable
Size Loops explicit refMatmul assumed explicit
=========================================================
2 5000 0.064 0.080 0.051 0.064
4 5000 0.406 0.450 0.347 0.407
8 5000 1.342 1.124 1.364 1.437
16 5000 2.989 1.865 3.427 3.760
32 5000 5.543 3.481 8.203 8.700
64 5000 11.632 4.021 11.647 11.729
128 3829 13.968 4.372 13.966 14.046
256 477 15.778 4.717 15.780 15.761
512 59 16.102 4.855 16.075 16.109
1024 7 15.867 3.596 15.884 15.886
So, there could be quite some gain in performance if this
could be exploited; even more for architectures like AVX-512,
I suspect.
Do you think this is worth pursuing? If so, how could/should this
be implemented= Does anybody do this kind of thing in gcc yet?
Regards
Thomas