http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59643
Bug ID: 59643 Summary: Predictive commoning unnecessarily punts on scimark2 SOR Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org I've noticed GCC performs badly on scimark2 SOR compared to llvm 3.[34], and I believe the difference is in predictive commoning, which IMHO unnecessarily gives up on the loop. https://cmssdt.cern.ch/SDT/lxr/source/Validation/Performance/bin/SOR.c?v=Sat The inner loop is: for (j=1; j<Nm1; j++) Gi[j] = omega_over_four * (Gim1[j] + Gip1[j] + Gi[j-1] + Gi[j+1]) + one_minus_omega * Gi[j]; and the problem is that data ref doesn't know that Gim1[j] and Gip1[j] reads don't conflict with the Gi[j] write (they don't in the benchmark, but the compiler can't know that (unless -flto and some extra smart IPA analysis hints that, that is primarily a bad choice of data structures in the benchmark, instead of using array of pointers to double where each inner array is malloced separately, using two dimensional array might make it clear to the compiler there is no aliasing). When constructing components, pcom ignores read-read dependencies with offset that can't be determined, but in this case there is a write and thus all the data references are put into the same component and that component is unsuitable, because the offset can't be determined. For two writes with unknown dependencies, there is nothing that can be done, but I wonder if for the case of (suitable) write and some other read where we can't determine offset we really have to give up on both the data refs, rather than just the read. On this testcase, giving up on the Gim1[j] and Gip1[j] reads that could possibly overlap with Gi[j] write is IMHO fine, we just keep them as is and don't attempt to optimize them, and pcom doesn't optimize away writes either (or does it? then we'd need to say on the component that it shouldn't do it in that case). With the untested patch I'll attach scimark2 improved from SOR Mflops: 1135.50 (1000 x 1000) to SOR Mflops: 1617.87 (1000 x 1000)