http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60997
Bug ID: 60997 Summary: -fopenmp conflicts with -floop-interchange Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: dominiq at lps dot ens.fr CC: grosser at gcc dot gnu.org, jakub at gcc dot gnu.org, mircea.namolaru at inria dot fr Created attachment 32703 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32703&action=edit Test for three variants of the matrix product Compiling the attached code with -Ofast gives the following timing at run time [Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 [Book15] Fortran/omp_tst% time a.out 94378416668672.000 Elapsed time = 3.7326660000000000 seconds 94378416668672.000 Elapsed time = 0.57225000000000004 seconds 94378416668672.000 Elapsed time = 6.9233669999999998 seconds 94378416668672.000 Elapsed time = 0.47757300000000003 seconds 11.704u 0.030s 0:11.73 100.0% 0+0k 0+0io 2pf+0w Adding -floop-interchange at compile time gives [Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 -floop-interchange [Book15] Fortran/omp_tst% time a.out 94378416668672.000 Elapsed time = 0.57357899999999995 seconds 94378416668672.000 Elapsed time = 0.56863100000000000 seconds 94378416668672.000 Elapsed time = 0.56851499999999999 seconds 94378416668672.000 Elapsed time = 0.47033199999999997 seconds 2.195u 0.015s 0:02.21 99.5% 0+0k 0+0io 0pf+0w i.e., the three variants of the loop are transformed to the fastest one. Adding -fopenmp (and -fexternal-blas -framework vecLib) gives [Book15] Fortran/omp_tst% gfc -Ofast omp_tst_4_db_2.f90 -floop-interchange -fopenmp -fexternal-blas -framework vecLib [Book15] Fortran/omp_tst% time a.out 94378416668672.000 Elapsed time = 1.8143670000000001 seconds 94378416668672.000 Elapsed time = 0.12886900000000001 seconds 94378416668672.000 Elapsed time = 2.0025420000000000 seconds 94378416668672.000 Elapsed time = 2.9204999999999998E-002 seconds 31.030u 0.064s 0:04.00 777.2% 0+0k 4+4io 2pf+0w i.e., the loop interchange is prevented by the -fopenmp option. This is probably due to the fact that the -fopenmp option is processed before the graphite optimizations. The last timings are for the MATMUL intrinsic as a reference (using the system BLAS gives a 15 times speed-up).