http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58453
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- loop distribution distributes this into q2 = ( g2a(i+1) * g31a(i+1) * v1(i+1,j,k) 1 - g2a(i ) * g31a(i ) * v1(i ,j,k) ) 2 * dvl1ai(i) 3 + ( g32a(j+1) * v2(i,j+1,k) 4 - g32a(j ) * v2(i,j ,k) ) 5 * g2bi(i) * dvl2ai(j) 6 + ( v3(i,j,k+1) - v3(i,j,k) ) 7 * g31bi(i) * g32bi(j) * dvl3ai(k) q2 = q2 * q1 e(i,j,k) = ( 1.0 - q2 ) / ( 1.0 + q2 ) * e(i,j,k) and a memcpy for dlo(i,j,k) = d(i,j,k) and eod(i,j,k) = e(i,j,k) / d(i,j,k) (re-computing e(i,j,k) instead of loading it from the stored value - a known deficiency) This doesn't look wrong on the first glance (but it's probably slower). What the revision in question changed was remove some very odd code from rdg_flag_uses: - if (gimple_code (stmt) != GIMPLE_PHI) - { - if ((use_p = gimple_vuse_op (stmt)) != NULL_USE_OPERAND_P) - { - tree use = USE_FROM_PTR (use_p); - - if (TREE_CODE (use) == SSA_NAME - && !SSA_NAME_IS_DEFAULT_DEF (use)) - { - gimple def_stmt = SSA_NAME_DEF_STMT (use); - int v = rdg_vertex_for_stmt (rdg, def_stmt); - - if (v >= 0 - && !already_processed_vertex_p (processed, v)) - rdg_flag_vertex_and_dependent (rdg, v, partition, loops, - processed); - } - } - } that just doesn't make sense, but it likely made sure everything ended up in a single partition. Does the benchmark fail if you build with -ftree-loop-distribution -fno-tree-loop-distribute-patterns? (it should emit a loop instead of the memcpy call)