On 9/9/07, Roger Sayle <[EMAIL PROTECTED]> wrote: > > > This is an optimization pass which leads to dramatically better code on > > > at least one SPEC benchmark. Ian, Roger, Diego, would one of you care > > to review this?
Btw, diego already approved the patch. > My concern is that as formulated, conditional store elimination is not > always a win. > > Transforming > > if (cond) > *p = x; > > into > > tmp = *p; > if (cond) > tmp = x; > *p = tmp; > > on it's own, effectively transforms a conditional write to memory into an > unconditional write to memory. The pass currently only does the transformation if there is a preceding load already. Because I raised this same issue ;) > On many platforms, even x86, this a pessimization. For example, the "Intel > Architecture Optimization Manual", available at > ftp://download.intel.com/design/PentiumII/manuals/24281603.PDF in section > 3.5.5 "Write Allocation Effects", actually recommends the inverse > transformation. On page 3-21 they show how the "Sieve of Erastothenes" > benchmark can be sped up on Pentium class processors by transforming the > line > > array[j] = 0; > > into the equivalent > > if (array[j] != 0) > array[j] = 0; > > i.e. by introducing conditional stores. I suppose this is no longer true on modern processors? I think the pass may undo this case. But I'm also not sure we should worry about this. > The significant observation with Michael Matz's extremely impressive 26% > improvement on 456.hmmer is the interaction between this transformation with > other passes, that allow the conditional store to be hoisted out of a > critical loop. By reading the value into a "tmp" before the loop, > conditionally storing to the register tmp in the loop, then unconditionally > writing the result back afterwards, we dramatically reduce the number of > memory writes, rather than increase them as when this transformation is > applied in isolation. > > > I think the correct fix is not to apply this transformation everywhere, but > to correctly identify those loop cases where it helps and perform the loop > transformation there. i.e. conditional induction variable identification, > hoisting and sinking needs to be improved instead of pessimizing code to a > simpler form that allows our existing flawed passes to trigger. > > > I do very much like the loop-restricted version of this transformation, and > it's impressive impact of HMMR (whose author Sean Eddy is a good friend). > Perhaps Mark might give revised versions of this patch special dispensation > to be applied in stage 3. I'd not expect any correctness issues/bugs, just > performance trade-offs that need to be investigated. Perhaps we should even > apply this patch as is during stage 2, and allow the potential non-loop > performance degradations to be addressed as follow-up patches and therefore > regression fixes suitable for stage 3? I think it should be straight forward to apply this transformation in loops only - requiring analysis that the resulting loads and stores are loop invariant would of course require more work. Given the above restriction we apply already, do you hold your objections or would you consider the "simple" loop improvement? Thanks, Richard.