Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-10-06 Thread Richard Biener via Gcc-patches
On Fri, Oct 2, 2020 at 10:43 AM Alexandre Oliva  wrote:
>
> Here's what I got so far, passing regstrap except for field-merge-1.c,
> that relies on combining non-adjacent compares, which I haven't
> implemented yet.

Thanks for trying.

> I had to retain some parts of fold_truth_andor_1 to avoid regressions in
> gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c.
> gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required
> turning sign tests into masks.
>
> I moved the field-compare-merging bits of fold_truth_andor_1, and
> auxiliary functions for it, into fold_truth_andor_maybe_separate, though
> the ability to deal with non-adjacent compares is no longer used, at
> least for the time being.
>
> I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in
> maybe_fold_and_comparisons, but...  the fact that it may return a tree
> that isn't a gimple condexpr, e.g. when there is masking or shifting,
> and that requires re-gimplification, suggests moving it to
> tree-ssa-ifcombine.c where this can be dealt with, possibly even for
> non-adjacent compares.  Right now, there is code in ifcombine to
> re-gimplify results from the regular folder, which I've extended, for
> the time being, to maybe_fold_and_comparisons results as well.

A bit of a baind-aid, but let it be this way for now.

> I've reworked decode_field_reference to seek and "substitute" SSA DEFs
> in a way that avoids building new trees, but the test for
> substitutability feels incomplete: there's nothing to ensure that vuses
> aren't brought from much-earlier blocks, and that there couldn't
> possibly be an intervening store.  I suspect I will have to give these
> pieces a little more info for it to be able to tell whether the memory
> accesses, if moved, would still get the same value.  Is there anything
> easily reusable for this sort of test?

Well, the easiest way (and matching the GENERIC folding) is to
make sure the VUSEs are the same.  I didn't quite get the "substitute"
code though.

>
> As for fields crossing alignment boundaries, the two-tests condition
> currently returned by fold_truth_andor_maybe_separate, that ends up
> gimplified into a new block, causes the chain of ifcombine optimizations
> to stop, something that I'm yet to investigate.  My plan is to rearrange
> ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and
> to handle such resplit conditions by inserting one in each of the
> preexisting blocks, to simplify the gimplification and in preparation
> for combining non-adjacent compares, if we wish to do that.  Do we?

I guess it's simply because we iterate over a fixed set of BBs?  Or of course
if you re-shape conditions rather than just "merging" aka eliding one you
might create new opportunities.

>  I
> was convinced that it was a safe optimization, because of the absence of
> intervening side effects and the undefinedness of intervening trapping
> memory accesses, but combining later tests that access a word with
> earlier tests that access the same word may cause intervening trapping
> accesses to be skipped.  Is this a possibility we should enable or
> disable on e.g. a per-language basis, avoid altogether, or ...  any
> other suggestions?

ISTR ifcombine checks whether there are side-effects (such as trapping)
that might be skipped.  And indeed we shouldn't do such thing though
eliding a trap is OK-ish, introducing one is obviously not ;)  We do,
after all, fold 0/x to 0 (just not literal 0/0).

>
> Richi, is this sort of what you had in mind, or were you thinking of
> something for match.pd or so?  Is match.pd even able to perform
> cross-block multi-condition matches?

So what I see in your patch is matching the condition itself - this
part can be done using match.pd (match ...) patterns in a much
nicer way, see for example the use of gimple_ctz_table_index
in tree-ssa-forwprop.c and the corresponding

/* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
   The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
   constant which when multiplied by a power of 2 contains a unique value
   in the top 5 or 6 bits.  This is then indexed into a table which maps it
   to the number of trailing zeroes.  */
(match (ctz_table_index @1 @2 @3)
  (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))

match.pd itself doesn't match "control flow", that is you'd have to simplify
  if (c == 1) if (c != 5) { ... }
as (c == 1) & (c != 5) which match.pd might turn into 0.

> Any further advice as to my plans above?

I do think we eventually want to remove all GENERIC time handling of _load_
simplifications.  Note with match.pd you cannot easily handle loads, but
eventually there's bits that do not involve loads that can be implemented
as match.pd patterns.

Otherwise thanks for working on this.

Richard.

> Thanks,
>
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 0cc80ad..47b5419 100644
> --- 

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-10-02 Thread Alexandre Oliva
Here's what I got so far, passing regstrap except for field-merge-1.c,
that relies on combining non-adjacent compares, which I haven't
implemented yet.

I had to retain some parts of fold_truth_andor_1 to avoid regressions in
gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c.
gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required
turning sign tests into masks.

I moved the field-compare-merging bits of fold_truth_andor_1, and
auxiliary functions for it, into fold_truth_andor_maybe_separate, though
the ability to deal with non-adjacent compares is no longer used, at
least for the time being.

I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in
maybe_fold_and_comparisons, but...  the fact that it may return a tree
that isn't a gimple condexpr, e.g. when there is masking or shifting,
and that requires re-gimplification, suggests moving it to
tree-ssa-ifcombine.c where this can be dealt with, possibly even for
non-adjacent compares.  Right now, there is code in ifcombine to
re-gimplify results from the regular folder, which I've extended, for
the time being, to maybe_fold_and_comparisons results as well.

I've reworked decode_field_reference to seek and "substitute" SSA DEFs
in a way that avoids building new trees, but the test for
substitutability feels incomplete: there's nothing to ensure that vuses
aren't brought from much-earlier blocks, and that there couldn't
possibly be an intervening store.  I suspect I will have to give these
pieces a little more info for it to be able to tell whether the memory
accesses, if moved, would still get the same value.  Is there anything
easily reusable for this sort of test?

As for fields crossing alignment boundaries, the two-tests condition
currently returned by fold_truth_andor_maybe_separate, that ends up
gimplified into a new block, causes the chain of ifcombine optimizations
to stop, something that I'm yet to investigate.  My plan is to rearrange
ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and
to handle such resplit conditions by inserting one in each of the
preexisting blocks, to simplify the gimplification and in preparation
for combining non-adjacent compares, if we wish to do that.  Do we?  I
was convinced that it was a safe optimization, because of the absence of
intervening side effects and the undefinedness of intervening trapping
memory accesses, but combining later tests that access a word with
earlier tests that access the same word may cause intervening trapping
accesses to be skipped.  Is this a possibility we should enable or
disable on e.g. a per-language basis, avoid altogether, or ...  any
other suggestions?

Richi, is this sort of what you had in mind, or were you thinking of
something for match.pd or so?  Is match.pd even able to perform
cross-block multi-condition matches?

Any further advice as to my plans above?

Thanks,


diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0cc80ad..47b5419 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -125,7 +125,6 @@ static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
 static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, 
tree);
-static tree unextend (tree, int, int, tree);
 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
 static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
 static tree fold_binary_op_with_conditional_arg (location_t,
@@ -4353,7 +4352,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
is the original memory reference used to preserve the alias set of
the access.  */
 
-static tree
+tree
 make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
HOST_WIDE_INT bitsize, poly_int64 bitpos,
int unsignedp, int reversep)
@@ -4603,132 +4602,6 @@ optimize_bit_field_compare (location_t loc, enum 
tree_code code,
   return lhs;
 }
 
-/* Subroutine for fold_truth_andor_1: decode a field reference.
-
-   If EXP is a comparison reference, we return the innermost reference.
-
-   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
-   set to the starting bit number.
-
-   If the innermost field can be completely contained in a mode-sized
-   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
-
-   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
-   otherwise it is not changed.
-
-   *PUNSIGNEDP is set to the signedness of the field.
-
-   *PREVERSEP is set to the storage order of the field.
-
-   *PMASK is set to the mask used.  This is either contained in a
-   BIT_AND_EXPR or derived from the width of the field.
-
-   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
-
-   Return 0 if this is not a component reference or is one that we can't
-   do anything with.  */
-
-static tree
-decode_field_reference (location_t 

Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-30 Thread Richard Biener via Gcc-patches
On Tue, Sep 29, 2020 at 3:07 PM Alexandre Oliva  wrote:
>
> On Sep 29, 2020, Richard Biener  wrote:
>
> > On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva  wrote:
>
> >> On Sep 28, 2020, Richard Biener  wrote:
>
> > ifcombine should stop using fold*, yeah
>
> Wow, that's quite a lot of work for no expected improvement in codegen.
> I don't expect to be able to justify such an undertaking :-(
>
> > I also think it will not end up using the simplifications using loads.
>
> Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
> the inner block.  that won't do when the whole point is to merge loads
> from memory.
>
> That seems excessive.  Since we rule out any memory-changing side
> effects, I suppose we could get away with checking for volatile operands
> there.  Then, adding just a little SSA_DEF chasing, I believe I could
> bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
> without much difficulty, and then we could do away with at least that
> part of fold_truth_andor.

The current restrictions were for sure to make my life easier at start
when implementing the pass ;)  Note that you have to watch out
for short-circuited stmts that may trap or invoke undefined behavior
at runtime.

> > Specifically your patch seems to introduce splitting of loads
> > at alignment boundaries
>
> ... when there's another compare involving a load from either side of
> the crossed alignment boundary.  Even on arches that can do unaligned
> loads, the result is no worse, and if there are multiple fields crossing
> consecutive alignment boundaries, the codegen and performance difference
> can be pretty significant.

Ah, OK - I didn't look that closely.

> >> I *think* ifcombine could even be extended so as to reuse the
> >> separate-test logic I put in, by looking for non-immediate dominating
> >> outer conditions for the inner condition.  A further modified version of
> >> fold_truth_andor_1 could then be used to combine the separate tests.
>
> > I think the structure of ifcombine doesn't exactly match what
> > fold_truth_andor does
>
> How so?  AFAICT ifcombine_ifandif deals exactly with the (gimplified
> version of the) structure I described in the patch that started the
> thread:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)

Indeed.

>
> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer


Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Alexandre Oliva
On Sep 29, 2020, Alexandre Oliva  wrote:

> Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
> the inner block.  that won't do when the whole point is to merge loads
> from memory.

> That seems excessive.  Since we rule out any memory-changing side
> effects, I suppose we could get away with checking for volatile operands
> there.  Then, adding just a little SSA_DEF chasing, I believe I could
> bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
> without much difficulty, and then we could do away with at least that
> part of fold_truth_andor.

Confirmed, a very ugly prototype seems to work!

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Alexandre Oliva
On Sep 29, 2020, Richard Biener  wrote:

> On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva  wrote:

>> On Sep 28, 2020, Richard Biener  wrote:

> ifcombine should stop using fold*, yeah

Wow, that's quite a lot of work for no expected improvement in codegen.
I don't expect to be able to justify such an undertaking :-(

> I also think it will not end up using the simplifications using loads.

Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in
the inner block.  that won't do when the whole point is to merge loads
from memory.

That seems excessive.  Since we rule out any memory-changing side
effects, I suppose we could get away with checking for volatile operands
there.  Then, adding just a little SSA_DEF chasing, I believe I could
bring all of the fold_truth_andor_1 logic I've worked on into ifcombine
without much difficulty, and then we could do away with at least that
part of fold_truth_andor.


> Specifically your patch seems to introduce splitting of loads
> at alignment boundaries

... when there's another compare involving a load from either side of
the crossed alignment boundary.  Even on arches that can do unaligned
loads, the result is no worse, and if there are multiple fields crossing
consecutive alignment boundaries, the codegen and performance difference
can be pretty significant.



>> I *think* ifcombine could even be extended so as to reuse the
>> separate-test logic I put in, by looking for non-immediate dominating
>> outer conditions for the inner condition.  A further modified version of
>> fold_truth_andor_1 could then be used to combine the separate tests.

> I think the structure of ifcombine doesn't exactly match what
> fold_truth_andor does

How so?  AFAICT ifcombine_ifandif deals exactly with the (gimplified
version of the) structure I described in the patch that started the
thread:

  (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Richard Biener via Gcc-patches
On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva  wrote:
>
> On Sep 28, 2020, Richard Biener  wrote:
>
> > On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva  wrote:
>
> >> This patch introduces various improvements to the logic that merges
> >> field compares.
>
> > Sorry for throwing a wrench in here but doing this kind of transform
> > during GENERIC folding is considered a bad thing.
>
> Ugh.  Even tree-ifcombine itself relies on tree folding logic to perform
> this and various similar sorts of optimizations.

ifcombine should stop using fold*, yeah - there was a start last year
making maybe_fold_and/or_comparisons rely on match.pd
patterns instead but it didn't went all the way disabling dispatching
to fold. I also think it will not end up using the simplifications using
loads.

> Is the issue just that we don't want to perform this kind of
> transformation while still in GENERIC, or that the logic must not even
> be there?  I ask because it wouldn't be too hard to disable unwanted
> cases of folding while in GENERIC, and extending it to follow SSA defs
> so that ifcombine would those optimizations back at an acceptable spot.

One of of the issues of these "combine two memory ops using $fancy
for branches" transforms is that they are quite premature, making
optimizations like SRA, value-range propagation, jump threading, etc.
very much harder.  Not to mention diagnostics that rely on the IL
being close to the AST (well, I regularly argue those are broken
expectations, so ;))

> Please help me understand what it is that makes this change as it is
> undesirable, so that I can figure out how to do it in an acceptable way,
> and justify the additional time and effort it will take.

The desire is to move the existing and related transforms
(optimize_bit_field_compare) to GIMPLE.  By making them
more powerful this task is made more difficult.

Specifically your patch seems to introduce splitting of loads
at alignment boundaries which is even more pemature
since some targets would likely prefer an unaligned single
load plus we try to not deviate too much in the initial
phase of compilation depending on target capabilities.
It also seems to be sth not suited for GENERIC which is
somehow our highlevel AST.

> I *think* ifcombine could even be extended so as to reuse the
> separate-test logic I put in, by looking for non-immediate dominating
> outer conditions for the inner condition.  A further modified version of
> fold_truth_andor_1 could then be used to combine the separate tests.

I think the structure of ifcombine doesn't exactly match what
fold_truth_andor does but since ifcombine first matches
the if "structure" a common worker could be indeed factored out
(but as said above, I see little value in retaining fold_truth_andor
on GENERIC).

>
> As for reassoc...  It doesn't seem a good fit at all for reassociating
> short-circuiting boolean operations, that normally get translated as
> multiple basic blocks.

Yeah - it did accumulate quite some range test optimizations but
I think it doesn't cover multiple BBs.

Richard.

> --
> Alexandre Oliva, happy hacker
> https://FSFLA.org/blogs/lxo/
> Free Software Activist
> GNU Toolchain Engineer


Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-29 Thread Alexandre Oliva
On Sep 28, 2020, Richard Biener  wrote:

> On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva  wrote:

>> This patch introduces various improvements to the logic that merges
>> field compares.

> Sorry for throwing a wrench in here but doing this kind of transform
> during GENERIC folding is considered a bad thing.

Ugh.  Even tree-ifcombine itself relies on tree folding logic to perform
this and various similar sorts of optimizations.

Is the issue just that we don't want to perform this kind of
transformation while still in GENERIC, or that the logic must not even
be there?  I ask because it wouldn't be too hard to disable unwanted
cases of folding while in GENERIC, and extending it to follow SSA defs
so that ifcombine would those optimizations back at an acceptable spot.

Please help me understand what it is that makes this change as it is
undesirable, so that I can figure out how to do it in an acceptable way,
and justify the additional time and effort it will take.


I *think* ifcombine could even be extended so as to reuse the
separate-test logic I put in, by looking for non-immediate dominating
outer conditions for the inner condition.  A further modified version of
fold_truth_andor_1 could then be used to combine the separate tests.


As for reassoc...  It doesn't seem a good fit at all for reassociating
short-circuiting boolean operations, that normally get translated as
multiple basic blocks.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH] assorted improvements for fold_truth_andor_1

2020-09-28 Thread Richard Biener via Gcc-patches
On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva  wrote:
>
>
> This patch introduces various improvements to the logic that merges
> field compares.
>
> Before the patch, we could merge:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)
>
> into something like:
>
>   (((type *))[Na] & MASK) EQNE (((type *))[Nb] & MASK)
>
> if both of A's fields live within the same alignment boundaries, and
> so do B's, at the same relative positions.  Constants may be used
> instead of the object B.
>
> The initial goal of this patch was to enable such combinations when a
> field crossed alignment boundaries, e.g. for packed types.  We can't
> generally access such fields with a single memory access, so when we
> come across such a compare, we will attempt to combine each access
> separately.
>
> Some merging opportunities were missed because of right-shifts,
> compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
> narrowing conversions, especially after earlier merges.  This patch
> introduces handlers for several cases involving these.
>
> Other merging opportunities were missed because of association.  The
> existing logic would only succeed in merging a pair of consecutive
> compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying
> e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D.  I've generalized the
> handling of the rightmost compare in the left-hand operand, going for
> the leftmost compare in the right-hand operand, and then onto trying
> to merge compares pairwise, one from each operand, even if they are
> not consecutive, taking care to avoid merging operations with
> intervening side effects, including volatile accesses.
>
> When it is the second of a non-consecutive pair of compares that first
> accesses a word, we may merge the first compare with part of the
> second compare that refers to the same word, keeping the compare of
> the remaining bits at the spot where the second compare used to be.
>
> Handling compares with non-constant fields was somewhat generalized,
> now handling non-adjacent fields.  When a field of one object crosses
> an alignment boundary but the other doesn't, we issue the same load in
> both compares; gimple optimizers will later turn it into a single
> load, without our having to handle SAVE_EXPRs at this point.
>
> The logic for issuing split loads and compares, and ordering them, is
> now shared between all cases of compares with constants and with
> another object.
>
> The -Wno-error for toplev.o on rs6000 is because of toplev.c's:
>
>   if ((flag_sanitize & SANITIZE_ADDRESS)
>   && !FRAME_GROWS_DOWNWARD)
>
> and rs6000.h's:
>
> #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0   \
>   || (flag_sanitize & SANITIZE_ADDRESS) != 0)
>
> The mutually exclusive conditions involving flag_sanitize are now
> noticed and reported by fold-const.c's:
>
>   warning (0,
>"% of mutually exclusive equal-tests"
>" is always 0");
>
> This patch enables over 12k compare-merging opportunities that we used
> to miss in a GCC bootstrap.
>
> Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu.  Ok to install?

Sorry for throwing a wrench in here but doing this kind of transform
during GENERIC folding is considered a bad thing.

There are better places during GIMPLE opts to replicate (and enhance)
what fold_truth_andor does, namely ifcombine and reassoc, to mention
the two passes that do similar transforms (they are by no means exact
matches, otherwise fold_truth_andor would be gone already).

Richard.

>
> for  gcc/ChangeLog
>
> * fold-const.c (prepare_xor): New.
> (decode_field_reference): Handle xor, shift, and narrowing
> conversions.
> (all_ones_mask_p): Remove.
> (compute_split_boundary_from_align): New.
> (build_split_load, reuse_split_load): New.
> (fold_truth_andor_1): Add recursion to combine pairs of
> non-neighboring compares.  Handle xor compared with zero.
> Handle fields straddling across alignment boundaries.
> Generalize handling of non-constant rhs.
> (fold_truth_andor): Leave sub-expression handling to the
> recursion above.
> * config/rs6000/t-rs6000 (toplev.o-warn): Disable errors.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/field-merge-1.c: New.
> * gcc.dg/field-merge-2.c: New.
> * gcc.dg/field-merge-3.c: New.
> * gcc.dg/field-merge-4.c: New.
> * gcc.dg/field-merge-5.c: New.
> ---
>  gcc/config/rs6000/t-rs6000   |4
>  gcc/fold-const.c |  818 
> --
>  gcc/testsuite/gcc.dg/field-merge-1.c |   64 +++
>  gcc/testsuite/gcc.dg/field-merge-2.c |   31 +
>  gcc/testsuite/gcc.dg/field-merge-3.c |   36 +
>  gcc/testsuite/gcc.dg/field-merge-4.c |   40 ++
>  gcc/testsuite/gcc.dg/field-merge-5.c |   40 ++
>  7 files changed, 882 insertions(+), 151 deletions(-)
>