Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.

Richard Biener Mon, 22 Feb 2016 06:33:12 -0800

On Fri, Feb 19, 2016 at 10:01 PM, Jeff Law <l...@redhat.com> wrote:
> On 02/18/2016 02:56 AM, Richard Biener wrote:
>>>>
>>>> Just a short quick comment - the above means you only handle partial
>>>> stores
>>>> with no interveaning uses.  You don't handle, say
>>>>
>>>> struct S { struct R { int x; int y; } r; int z; } s;
>>>>
>>>>    s = { {1, 2}, 3 };
>>>>    s.r.x = 1;
>>>>    s.r.y = 2;
>>>>    struct R r = s.r;
>>>>    s.z = 3;
>>>>
>>>> where s = { {1, 2}, 3} is still dead.
>>>
>>>
>>> Right.  But handling that has never been part of DSE's design goals. Once
>>> there's a use, DSE has always given up.
>>
>>
>> Yeah, which is why I in the end said we need a "better" DSE ...
>
> So I cobbled up a quick test for this.  I only handle assignments which may
> reference the same memory as the currently tracked store.  Obviously that
> could be extended to handle certain builtins and the like.
>
> It triggers a bit here and there.  While looking at those cases it occurred
> to me that, we could also look at this as a failure earlier in the
> optimization pipeline.  In fact DOM already has code to handle a closely
> related situation.
>
> When DOM sees a store to memory, it creates a new fake statement with the
> RHS and LHS reversed.  So in the case above DOM creates statements that look
> like:
>
> 1 = s.r.x
> 2 = s.r.y
>
> DOM then puts the RHS into the available expression table as equivalent to
> the LHS.  So if it finds a later load of s.r.x, it will replace that load
> with "1".  I haven't looked at it in a while, but it certainly was
> functional prior to the tuple merge.


It still is functional, that's how it CSEs across stores.

> Presumably DOM is not looking at r = s.r and realizing it could look s.r
> piece-wise in the available expression table.  If it did, it would
> effectively turn that fragment into:
>
>     s = { {1, 2}, 3 };
>     s.r.x = 1;
>     s.r.y = 2;
>     struct R r = {1, 2}
>     s.z = 3;
>
> At which point we no longer have the may-read of s.r.{x,y} and DSE would see
> the initial assignment as dead.

Yeah, but if it does not become dead you just increased code size or lifetime...

Looking up an aggregate component-wise is also quite expensive (which components
are you looking for?).

> I'm not sure if it's advisable to teach DOM how to lookup structure
> references piecewise or not.  The code to handle this case in DSE is
> relatively simple, so perhaps we just go with the DSE variant.

FRE does something related - it looks at all piecewise uses of 'r' and
eventually replaces them with pieces of s.r when seeing the r = s.r
aggregate assignment.  Of course that only makes the store to r dead if there
are no uses of it left.

> I also looked a bit at cases where we find that while an entire store (such
> as an aggregate initialization or mem*) may not be dead, pieces of the store
> may be dead.   That's trivial to detect.   It triggers relatively often.
> The trick is once detected, we have to go back and rewrite the original
> statement to only store the live parts.  I've only written the detection
> code, the rewriting might be somewhat painful.

Yes.  I think SRA has all the code to do that though, see how it
does scalarization of constant pool loads like

  aggr = *.LC1;

SRA has the additional limitation of only handling aggregate decls
that don't have their address taken beause it basically assumes
no aliasing and just analyzes all accesses in the whole function.
So it doesn't have an idea of access ordering.

But it's core data structures and routines would be suitable to
build up accesses and doing replacements if you do a more
conscious, alias-aware analysis of them.  So if you don't use a simple
bitmap but maybe share 'access' with the SRA code it may help
you doing the required transform.

> I'm starting to wonder if what we have is a 3-part series.
>
> [1/3] The basic tracking to handle 33562, possibly included in gcc-6
> [2/3] Ignore reads that reference stuff not in live_bytes for gcc-7
> [3/3] Detect partially dead aggregate stores, rewriting the partially
>       dead store to only store the live bytes.  Also for gcc-7.
>
>
> Obviously [1/3] would need compile-time benchmarking, but I really don't
> expect any issues there.

So what's the overall statistic result on [1/3] if you exclude the clobbers?

Richard.

> Jeff

Re: [RFC] [P2] [PR tree-optimization/33562] Lowering more complex assignments.

Reply via email to