On 08/29/2017 03:13 AM, Christophe Lyon wrote:
> Hi Jeff,
> 
> 
> On 29 August 2017 at 07:07, Jeff Law <l...@redhat.com> wrote:
>> This is a two part patchkit to improve DOM's ability to derive constant
>> equivalences that arise as a result of traversing a particular edge in
>> the CFG.
>>
>> Until now we only allowed a single NAME = NAME|CONST equivalence to be
>> associated with an edge in the CFG.  Patch #1 generalizes that code so
>> that we can record multiple simple equivalences on an edge.  Much like
>> expression equivalences, we just shove them into a vec and iterate on
>> the vec in the appropriate places.
>>
>> Patch #2 has the interesting bits.
>>
>> Back in the gcc-7 effort I added the ability to look at the operands of
>> a BIT_IOR_EXPR that had a zero result.  In that case each operand of the
>> BIT_IOR must have a zero value.  This was to address a missed
>> optimization regression bug during stage4.
>>
>> The plan was always to add analogous BIT_AND support, but I didn't feel
>> like handling BIT_AND was appropriate at the time (no bz entry and no
>> regressions related to that capability).
>>
>> I'd also had the sense that further improvements could be made here. For
>> example, it is common for the BIT_IOR or BIT_AND to be fed by a
>> comparison and we ought to be able to record the result of the
>> comparison.  If the comparison happened to be an equality test, then we
>> may ultimately derive a constant for on operand of the equality test as
>> well.
>>
>> It also seemed like the NOP/CONVERT_EXPR handling could be incorporated
>> into such a change.
>>
>> So I pulled together some instrumentation.  Lots of things generate
>> equivalences -- but a much smaller subset of those equivalences are
>> ultimately useful.
>>
>> Probably the most surprising was BIT_XOR, which allows us to generate
>> all kinds of equivalences, but none that were useful for ultimate
>> simplification in any of the tests I looked at.
>>
>>
>> The most subtle was COND_EXPRs.  We might have something like
>>
>> res = (a != 5) ? x : 1;
>>
>>
>> We can't actually derive anything useful for "a" here, even if we know
>> the result is one.  That's because "x" could have the value 1.  So you
>> end up only being able to derive equivalences for COND_EXPRs when both
>> arms have a constant value.  That restriction dramatically reduces the
>> utility of handling COND_EXPR -- to the point where I'm not including it.
>>
>> So what we end up with is BIT_AND/BIT_IOR, conversions, plus/minus,
>> comparisons and neg/not.
>>
>> So when we determine that a particular SSA_NAME has a constant value, we
>> look at the defining statement and essentially try to derive a value for
>> the input operand(s) based on knowing the result value.  If we can
>> derive a constant value for an input operand, we record that value and
>> recurse.
>>
>> In cases where we walk backwards to a condition.  We will record the
>> condition into the available expression table.
>>
>>
>> The code is written such that if we find cases where the equivalences
>> for other nodes are useful, they're easy to add.
>>
>>
>> These equivalences are most useful to the threader, but I've seen them
>> help in other cases as well.  There's a half-dozen or so new tests
>> reduced from GCC itself.
>>
>> Bootstrapped and regression tested on x86_64, lightly tested on ppc64le
>> via bootstrapping and running the new tests to verify they do the right
>> thing on a !logical_op_short_circuit target.
>>
>> Installing on the trunk.
>>
>> Jeff
>>
>>
>> commit 506ac60cacbc4c4e5e166513ea83c1d2e14eaf3b
>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Tue Aug 29 05:03:22 2017 +0000
>>
>>             * tree-ssa-dom.c (class edge_info): Changed from a struct
>>             to a class.  Add ctor/dtor, methods and data members.
>>             (edge_info::edge_info): Renamed from allocate_edge_info.
>>             Initialize additional members.
>>             (edge_info::~edge_info): New.
>>             (free_dom_edge_info): Delete the edge info.
>>             (record_edge_info): Use new class & associated member functions.
>>             Tighten forms for testing for edge equivalences.
>>             (record_temporary_equivalences): Iterate over the simple
>>             equivalences rather than assuming there's only one per edge.
>>             (cprop_into_successor_phis): Iterate over the simple
>>             equivalences rather than assuming there's only one per edge.
>>             (optimize_stmt): Use operand_equal_p rather than pointer
>>             equality for mini-DSE code.
>>
[ snip ]

>> commit a370df2c52074abbb044d1921a0c7df235758050
>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Tue Aug 29 05:03:36 2017 +0000
>>
>>             * tree-ssa-dom.c (edge_info::record_simple_equiv): Call
>>             derive_equivalences.
>>             (derive_equivalences_from_bit_ior, 
>> record_temporary_equivalences):
>>             Code moved into....
>>             (edge_info::derive_equivalences): New private member function
>>
>>             * gcc.dg/torture/pr57214.c: Fix type of loop counter.
>>             * gcc.dg/tree-ssa/ssa-sink-16.c: Disable DOM.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-12.c: New test.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-13.c: New test.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-14.c: New test.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-15.c: New test.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-16.c: New test.
>>             * gcc.dg/tree-ssa/ssa-dom-thread-17.c: New test.
>>
>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251397 
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>
> 
> 3 of the new tests fail on arm-none-linux-gnueabihf
> --with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16
> 
> FAIL:    gcc.dg/tree-ssa/ssa-dom-thread-11.c scan-tree-dump-times dom2
> "Threaded" 1
> FAIL:    gcc.dg/tree-ssa/ssa-dom-thread-14.c scan-tree-dump-times dom2
> "Threaded" 1
> FAIL:    gcc.dg/tree-ssa/ssa-dom-thread-16.c scan-tree-dump-times dom2
> "Threaded" 1
> 
> they do pass when configuring for cpu cortex-a9/a15 and fpu 
> neon-fp16/neon-vfpv4
> 
> I do not have the dumps since it's automated testing; let me know if
> you need me to
> reproduce it manually and extract the dumps.
Thanks.  -11 and -16 are fairly sensitive to branch costing so I'm not
terribly surprised to find out we're going to need to adjust the target
selectors a bit more.  I'll look into what's going on with -14 as well.

Thanks,
jeff

Reply via email to