https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78379
--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Here are some measurements with the AVX-enabling patch. They were done on an AVX machine, namely gcc75 from the compile farm. This was done with the command line gfortran -static-libgfortran -finline-matmul-limit=0 -Ofast -o compare_mavx compare_2.f90 Uncontidionally setting -mavx in the Makefile for matmul, with stock trunk: ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 0.067 0.077 0.051 0.069 3 5000 0.193 0.218 0.157 0.194 4 5000 0.429 0.423 0.368 0.435 5 5000 0.609 0.659 0.556 0.630 7 5000 0.948 1.018 0.931 1.009 8 5000 1.608 1.251 1.589 1.715 9 5000 1.755 1.484 1.745 1.856 15 5000 2.710 2.175 2.963 3.105 16 5000 4.289 2.510 4.541 4.784 17 5000 4.411 3.032 4.675 4.888 31 5000 6.165 4.395 6.912 6.902 32 5000 8.800 4.362 8.793 8.809 33 5000 8.156 4.463 8.145 8.193 63 5000 9.727 4.364 9.709 9.716 64 5000 11.828 4.023 11.810 11.798 65 5000 10.726 4.489 10.654 10.725 127 3920 12.144 4.292 12.281 12.268 128 3829 13.829 4.484 13.807 13.841 129 3741 12.986 4.438 12.964 12.985 255 483 14.446 4.571 14.462 14.442 256 477 15.738 4.707 15.744 15.738 257 472 13.981 4.565 13.995 13.990 511 60 14.954 4.674 14.977 14.933 512 59 16.120 4.840 16.137 16.062 513 59 14.488 4.392 14.497 14.490 1023 7 15.011 3.573 15.021 14.995 1024 7 15.938 3.489 15.947 15.938 1025 7 14.670 3.568 14.683 14.627 With library-side switching (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01810.html): ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 0.067 0.080 0.053 0.067 3 5000 0.192 0.226 0.159 0.192 4 5000 0.427 0.436 0.364 0.431 5 5000 0.588 0.664 0.543 0.621 7 5000 0.938 0.914 0.926 1.011 8 5000 1.589 1.235 1.558 1.671 9 5000 1.704 1.486 1.694 1.810 15 5000 2.638 2.175 2.854 3.031 16 5000 4.234 2.532 4.533 4.745 17 5000 4.374 3.044 4.677 4.839 31 5000 6.207 4.401 6.891 6.918 32 5000 8.824 4.364 8.614 8.603 33 5000 7.954 4.349 7.945 7.944 63 5000 8.802 4.369 9.728 9.764 64 5000 11.845 4.025 11.783 11.849 65 5000 10.753 4.595 10.719 10.753 127 3920 12.023 4.314 12.285 12.004 128 3829 13.427 4.369 13.722 13.742 129 3741 12.877 4.323 12.668 12.985 255 483 14.398 4.453 14.336 13.496 256 477 15.708 4.680 15.711 15.465 257 472 13.977 4.439 13.965 13.977 511 60 14.920 4.691 14.937 14.939 512 59 15.959 4.787 16.084 16.082 513 59 14.444 4.636 14.464 14.452 1023 7 14.978 3.448 14.979 14.980 1024 7 15.903 3.640 15.900 15.905 1025 7 14.638 3.464 14.626 14.636 With stock trunk: ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 0.072 0.078 0.053 0.072 3 5000 0.199 0.224 0.165 0.200 4 5000 0.458 0.403 0.387 0.462 5 5000 0.629 0.661 0.563 0.651 7 5000 1.073 1.010 1.029 1.131 8 5000 1.671 1.234 1.637 1.760 9 5000 1.732 1.465 1.720 1.829 15 5000 2.895 2.152 3.195 3.349 16 5000 3.870 2.483 4.168 4.318 17 5000 3.976 3.029 4.253 4.424 31 5000 6.210 4.403 6.861 6.868 32 5000 7.551 4.293 7.544 7.509 33 5000 7.119 4.418 7.094 7.090 63 5000 8.742 4.377 8.753 8.728 64 5000 9.415 4.019 9.384 9.260 65 5000 8.882 4.540 8.842 8.856 127 3920 10.073 4.432 9.966 9.988 128 3829 10.556 4.469 10.552 10.405 129 3741 9.923 4.428 9.990 9.930 255 483 10.827 4.569 10.875 10.768 256 477 11.328 4.705 11.281 11.129 257 472 10.402 4.492 10.344 10.360 511 60 10.947 4.674 11.003 10.938 512 59 11.503 4.842 11.504 11.314 513 59 10.654 4.672 10.651 10.619 1023 7 10.941 3.641 10.944 10.863 1024 7 11.370 3.587 11.261 11.193 1025 7 10.734 3.601 10.652 10.704 With inlined, -Ofast without -mavx: ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 8.979 0.078 0.154 0.241 3 5000 14.042 0.224 0.348 0.451 4 5000 1.686 0.435 0.500 0.707 5 5000 1.989 0.617 0.577 0.829 7 5000 2.163 0.846 0.783 1.123 8 5000 3.742 1.224 0.879 1.322 9 5000 2.764 1.420 0.996 1.458 15 5000 3.461 2.108 1.305 2.420 16 5000 4.395 2.589 1.619 2.901 17 5000 5.238 3.291 1.934 3.579 31 5000 7.207 4.434 2.347 4.385 32 5000 7.318 4.306 2.351 4.329 33 5000 7.204 4.466 2.052 4.421 63 5000 4.688 4.365 2.486 4.700 64 5000 4.246 4.022 2.480 4.664 65 5000 4.238 4.355 2.486 4.703 127 3920 4.411 4.427 2.821 4.340 128 3829 4.365 4.481 2.846 4.434 129 3741 4.427 4.441 2.828 4.396 255 483 4.561 4.569 2.972 4.517 256 477 4.666 4.701 2.905 4.685 257 472 4.520 4.573 2.974 4.550 511 60 4.669 4.675 3.075 4.666 512 59 4.823 4.843 3.095 4.835 513 59 4.655 4.672 3.077 4.651 1023 7 3.555 3.563 2.718 3.554 1024 7 3.519 3.529 2.713 3.519 1025 7 3.527 3.543 2.715 3.536 With inline version with -mavx: ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 5000 8.990 0.074 0.155 0.206 3 5000 7.488 0.212 0.304 0.396 4 5000 1.773 0.342 0.501 0.533 5 5000 2.000 0.552 0.615 0.739 7 5000 2.163 0.919 0.807 1.057 8 5000 3.369 1.388 0.905 1.578 9 5000 2.694 1.347 1.020 1.492 15 5000 3.441 2.201 1.325 2.631 16 5000 1.831 3.399 1.677 4.137 17 5000 4.554 3.461 1.976 4.120 31 5000 7.111 5.286 2.372 5.712 32 5000 8.384 5.887 2.040 6.725 33 5000 7.218 5.374 2.057 5.798 63 5000 8.131 6.107 2.477 6.418 64 5000 8.707 6.518 2.313 7.228 65 5000 7.768 6.003 2.427 4.503 127 3920 6.714 5.688 2.761 6.293 128 3829 7.067 6.688 2.777 6.880 129 3741 6.277 6.023 2.765 6.296 255 483 6.036 5.681 2.877 5.765 256 477 6.177 5.869 2.921 5.917 257 472 6.017 5.687 2.880 5.766 511 60 6.156 5.878 2.848 5.920 512 59 6.338 6.107 3.026 6.092 513 59 6.125 5.826 2.954 5.817 1023 7 4.130 4.111 2.623 4.104 1024 7 4.270 4.219 2.667 4.198 1025 7 4.206 4.159 2.616 4.149