http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49749
William J. Schmidt <wschmidt at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |wschmidt at gcc dot gnu.org |gnu.org | --- Comment #9 from William J. Schmidt <wschmidt at gcc dot gnu.org> 2011-07-20 16:28:49 UTC --- Created attachment 24798 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24798 Proposed patch I'm attaching a patch that solves this issue. The patch was produced against the ibm/gcc-4_6-branch, but should apply OK to trunk -- I'll verify that later if the direction of this patch is acceptable. This regains the 20% performance loss we had experienced in 410.bwaves, and also gives a 5% boost to 444.namd. No significant degradations were observed on SPEC cpu2006. In addition to fixing the operand access problems, the patch looks for loop-carried dependencies in innermost loops, and biases the reassociation so that the phi target is summed last. The purpose of this is to identify accumulator variables in inner loops and make each iteration of their accumulations independent. When these loops are unrolled, multiple independent iterations can be interleaved for improved performance. The optimization is restricted to innermost loops to avoid unnecessarily raising register pressure. There may be a better way to achieve the bias than what I've chosen here, so please comment. If the general approach is acceptable, I'll apply comments and submit the patch for approval.