Re: [PATCH] assorted improvements for fold_truth_andor_1
On Fri, Oct 2, 2020 at 10:43 AM Alexandre Oliva wrote: > > Here's what I got so far, passing regstrap except for field-merge-1.c, > that relies on combining non-adjacent compares, which I haven't > implemented yet. Thanks for trying. > I had to retain some parts of fold_truth_andor_1 to avoid regressions in > gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c. > gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required > turning sign tests into masks. > > I moved the field-compare-merging bits of fold_truth_andor_1, and > auxiliary functions for it, into fold_truth_andor_maybe_separate, though > the ability to deal with non-adjacent compares is no longer used, at > least for the time being. > > I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in > maybe_fold_and_comparisons, but... the fact that it may return a tree > that isn't a gimple condexpr, e.g. when there is masking or shifting, > and that requires re-gimplification, suggests moving it to > tree-ssa-ifcombine.c where this can be dealt with, possibly even for > non-adjacent compares. Right now, there is code in ifcombine to > re-gimplify results from the regular folder, which I've extended, for > the time being, to maybe_fold_and_comparisons results as well. A bit of a baind-aid, but let it be this way for now. > I've reworked decode_field_reference to seek and "substitute" SSA DEFs > in a way that avoids building new trees, but the test for > substitutability feels incomplete: there's nothing to ensure that vuses > aren't brought from much-earlier blocks, and that there couldn't > possibly be an intervening store. I suspect I will have to give these > pieces a little more info for it to be able to tell whether the memory > accesses, if moved, would still get the same value. Is there anything > easily reusable for this sort of test? Well, the easiest way (and matching the GENERIC folding) is to make sure the VUSEs are the same. I didn't quite get the "substitute" code though. > > As for fields crossing alignment boundaries, the two-tests condition > currently returned by fold_truth_andor_maybe_separate, that ends up > gimplified into a new block, causes the chain of ifcombine optimizations > to stop, something that I'm yet to investigate. My plan is to rearrange > ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and > to handle such resplit conditions by inserting one in each of the > preexisting blocks, to simplify the gimplification and in preparation > for combining non-adjacent compares, if we wish to do that. Do we? I guess it's simply because we iterate over a fixed set of BBs? Or of course if you re-shape conditions rather than just "merging" aka eliding one you might create new opportunities. > I > was convinced that it was a safe optimization, because of the absence of > intervening side effects and the undefinedness of intervening trapping > memory accesses, but combining later tests that access a word with > earlier tests that access the same word may cause intervening trapping > accesses to be skipped. Is this a possibility we should enable or > disable on e.g. a per-language basis, avoid altogether, or ... any > other suggestions? ISTR ifcombine checks whether there are side-effects (such as trapping) that might be skipped. And indeed we shouldn't do such thing though eliding a trap is OK-ish, introducing one is obviously not ;) We do, after all, fold 0/x to 0 (just not literal 0/0). > > Richi, is this sort of what you had in mind, or were you thinking of > something for match.pd or so? Is match.pd even able to perform > cross-block multi-condition matches? So what I see in your patch is matching the condition itself - this part can be done using match.pd (match ...) patterns in a much nicer way, see for example the use of gimple_ctz_table_index in tree-ssa-forwprop.c and the corresponding /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic constant which when multiplied by a power of 2 contains a unique value in the top 5 or 6 bits. This is then indexed into a table which maps it to the number of trailing zeroes. */ (match (ctz_table_index @1 @2 @3) (rshift (mult (bit_and:c (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3)) match.pd itself doesn't match "control flow", that is you'd have to simplify if (c == 1) if (c != 5) { ... } as (c == 1) & (c != 5) which match.pd might turn into 0. > Any further advice as to my plans above? I do think we eventually want to remove all GENERIC time handling of _load_ simplifications. Note with match.pd you cannot easily handle loads, but eventually there's bits that do not involve loads that can be implemented as match.pd patterns. Otherwise thanks for working on this. Richard. > Thanks, > > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c > index 0cc80ad..47b5419 100644 > ---
Re: [PATCH] assorted improvements for fold_truth_andor_1
Here's what I got so far, passing regstrap except for field-merge-1.c, that relies on combining non-adjacent compares, which I haven't implemented yet. I had to retain some parts of fold_truth_andor_1 to avoid regressions in gcc.c-torture/execute/ieee/compare-fp-3.c and gcc.dg/pr81228.c. gcc.target/i386/pr37248-2.c and gcc.target/i386/pr37248-3.c required turning sign tests into masks. I moved the field-compare-merging bits of fold_truth_andor_1, and auxiliary functions for it, into fold_truth_andor_maybe_separate, though the ability to deal with non-adjacent compares is no longer used, at least for the time being. I put fold_truth_andor_maybe_separate in gimple-fold.c, and called it in maybe_fold_and_comparisons, but... the fact that it may return a tree that isn't a gimple condexpr, e.g. when there is masking or shifting, and that requires re-gimplification, suggests moving it to tree-ssa-ifcombine.c where this can be dealt with, possibly even for non-adjacent compares. Right now, there is code in ifcombine to re-gimplify results from the regular folder, which I've extended, for the time being, to maybe_fold_and_comparisons results as well. I've reworked decode_field_reference to seek and "substitute" SSA DEFs in a way that avoids building new trees, but the test for substitutability feels incomplete: there's nothing to ensure that vuses aren't brought from much-earlier blocks, and that there couldn't possibly be an intervening store. I suspect I will have to give these pieces a little more info for it to be able to tell whether the memory accesses, if moved, would still get the same value. Is there anything easily reusable for this sort of test? As for fields crossing alignment boundaries, the two-tests condition currently returned by fold_truth_andor_maybe_separate, that ends up gimplified into a new block, causes the chain of ifcombine optimizations to stop, something that I'm yet to investigate. My plan is to rearrange ifcombine_ifandif to call fold_truth_andor_maybe_separate directly, and to handle such resplit conditions by inserting one in each of the preexisting blocks, to simplify the gimplification and in preparation for combining non-adjacent compares, if we wish to do that. Do we? I was convinced that it was a safe optimization, because of the absence of intervening side effects and the undefinedness of intervening trapping memory accesses, but combining later tests that access a word with earlier tests that access the same word may cause intervening trapping accesses to be skipped. Is this a possibility we should enable or disable on e.g. a per-language basis, avoid altogether, or ... any other suggestions? Richi, is this sort of what you had in mind, or were you thinking of something for match.pd or so? Is match.pd even able to perform cross-block multi-condition matches? Any further advice as to my plans above? Thanks, diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 0cc80ad..47b5419 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -125,7 +125,6 @@ static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); -static tree unextend (tree, int, int, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *); static tree fold_binary_op_with_conditional_arg (location_t, @@ -4353,7 +4352,7 @@ invert_truthvalue_loc (location_t loc, tree arg) is the original memory reference used to preserve the alias set of the access. */ -static tree +tree make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type, HOST_WIDE_INT bitsize, poly_int64 bitpos, int unsignedp, int reversep) @@ -4603,132 +4602,6 @@ optimize_bit_field_compare (location_t loc, enum tree_code code, return lhs; } -/* Subroutine for fold_truth_andor_1: decode a field reference. - - If EXP is a comparison reference, we return the innermost reference. - - *PBITSIZE is set to the number of bits in the reference, *PBITPOS is - set to the starting bit number. - - If the innermost field can be completely contained in a mode-sized - unit, *PMODE is set to that mode. Otherwise, it is set to VOIDmode. - - *PVOLATILEP is set to 1 if the any expression encountered is volatile; - otherwise it is not changed. - - *PUNSIGNEDP is set to the signedness of the field. - - *PREVERSEP is set to the storage order of the field. - - *PMASK is set to the mask used. This is either contained in a - BIT_AND_EXPR or derived from the width of the field. - - *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any. - - Return 0 if this is not a component reference or is one that we can't - do anything with. */ - -static tree -decode_field_reference (location_t
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Tue, Sep 29, 2020 at 3:07 PM Alexandre Oliva wrote: > > On Sep 29, 2020, Richard Biener wrote: > > > On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva wrote: > > >> On Sep 28, 2020, Richard Biener wrote: > > > ifcombine should stop using fold*, yeah > > Wow, that's quite a lot of work for no expected improvement in codegen. > I don't expect to be able to justify such an undertaking :-( > > > I also think it will not end up using the simplifications using loads. > > Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in > the inner block. that won't do when the whole point is to merge loads > from memory. > > That seems excessive. Since we rule out any memory-changing side > effects, I suppose we could get away with checking for volatile operands > there. Then, adding just a little SSA_DEF chasing, I believe I could > bring all of the fold_truth_andor_1 logic I've worked on into ifcombine > without much difficulty, and then we could do away with at least that > part of fold_truth_andor. The current restrictions were for sure to make my life easier at start when implementing the pass ;) Note that you have to watch out for short-circuited stmts that may trap or invoke undefined behavior at runtime. > > Specifically your patch seems to introduce splitting of loads > > at alignment boundaries > > ... when there's another compare involving a load from either side of > the crossed alignment boundary. Even on arches that can do unaligned > loads, the result is no worse, and if there are multiple fields crossing > consecutive alignment boundaries, the codegen and performance difference > can be pretty significant. Ah, OK - I didn't look that closely. > >> I *think* ifcombine could even be extended so as to reuse the > >> separate-test logic I put in, by looking for non-immediate dominating > >> outer conditions for the inner condition. A further modified version of > >> fold_truth_andor_1 could then be used to combine the separate tests. > > > I think the structure of ifcombine doesn't exactly match what > > fold_truth_andor does > > How so? AFAICT ifcombine_ifandif deals exactly with the (gimplified > version of the) structure I described in the patch that started the > thread: > > (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) Indeed. > > -- > Alexandre Oliva, happy hacker > https://FSFLA.org/blogs/lxo/ > Free Software Activist > GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Sep 29, 2020, Alexandre Oliva wrote: > Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in > the inner block. that won't do when the whole point is to merge loads > from memory. > That seems excessive. Since we rule out any memory-changing side > effects, I suppose we could get away with checking for volatile operands > there. Then, adding just a little SSA_DEF chasing, I believe I could > bring all of the fold_truth_andor_1 logic I've worked on into ifcombine > without much difficulty, and then we could do away with at least that > part of fold_truth_andor. Confirmed, a very ugly prototype seems to work! -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Sep 29, 2020, Richard Biener wrote: > On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva wrote: >> On Sep 28, 2020, Richard Biener wrote: > ifcombine should stop using fold*, yeah Wow, that's quite a lot of work for no expected improvement in codegen. I don't expect to be able to justify such an undertaking :-( > I also think it will not end up using the simplifications using loads. Yeah, ifcombine's bb_no_side_effects_p gives up on any gimple_vuse in the inner block. that won't do when the whole point is to merge loads from memory. That seems excessive. Since we rule out any memory-changing side effects, I suppose we could get away with checking for volatile operands there. Then, adding just a little SSA_DEF chasing, I believe I could bring all of the fold_truth_andor_1 logic I've worked on into ifcombine without much difficulty, and then we could do away with at least that part of fold_truth_andor. > Specifically your patch seems to introduce splitting of loads > at alignment boundaries ... when there's another compare involving a load from either side of the crossed alignment boundary. Even on arches that can do unaligned loads, the result is no worse, and if there are multiple fields crossing consecutive alignment boundaries, the codegen and performance difference can be pretty significant. >> I *think* ifcombine could even be extended so as to reuse the >> separate-test logic I put in, by looking for non-immediate dominating >> outer conditions for the inner condition. A further modified version of >> fold_truth_andor_1 could then be used to combine the separate tests. > I think the structure of ifcombine doesn't exactly match what > fold_truth_andor does How so? AFAICT ifcombine_ifandif deals exactly with the (gimplified version of the) structure I described in the patch that started the thread: (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Tue, Sep 29, 2020 at 9:23 AM Alexandre Oliva wrote: > > On Sep 28, 2020, Richard Biener wrote: > > > On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva wrote: > > >> This patch introduces various improvements to the logic that merges > >> field compares. > > > Sorry for throwing a wrench in here but doing this kind of transform > > during GENERIC folding is considered a bad thing. > > Ugh. Even tree-ifcombine itself relies on tree folding logic to perform > this and various similar sorts of optimizations. ifcombine should stop using fold*, yeah - there was a start last year making maybe_fold_and/or_comparisons rely on match.pd patterns instead but it didn't went all the way disabling dispatching to fold. I also think it will not end up using the simplifications using loads. > Is the issue just that we don't want to perform this kind of > transformation while still in GENERIC, or that the logic must not even > be there? I ask because it wouldn't be too hard to disable unwanted > cases of folding while in GENERIC, and extending it to follow SSA defs > so that ifcombine would those optimizations back at an acceptable spot. One of of the issues of these "combine two memory ops using $fancy for branches" transforms is that they are quite premature, making optimizations like SRA, value-range propagation, jump threading, etc. very much harder. Not to mention diagnostics that rely on the IL being close to the AST (well, I regularly argue those are broken expectations, so ;)) > Please help me understand what it is that makes this change as it is > undesirable, so that I can figure out how to do it in an acceptable way, > and justify the additional time and effort it will take. The desire is to move the existing and related transforms (optimize_bit_field_compare) to GIMPLE. By making them more powerful this task is made more difficult. Specifically your patch seems to introduce splitting of loads at alignment boundaries which is even more pemature since some targets would likely prefer an unaligned single load plus we try to not deviate too much in the initial phase of compilation depending on target capabilities. It also seems to be sth not suited for GENERIC which is somehow our highlevel AST. > I *think* ifcombine could even be extended so as to reuse the > separate-test logic I put in, by looking for non-immediate dominating > outer conditions for the inner condition. A further modified version of > fold_truth_andor_1 could then be used to combine the separate tests. I think the structure of ifcombine doesn't exactly match what fold_truth_andor does but since ifcombine first matches the if "structure" a common worker could be indeed factored out (but as said above, I see little value in retaining fold_truth_andor on GENERIC). > > As for reassoc... It doesn't seem a good fit at all for reassociating > short-circuiting boolean operations, that normally get translated as > multiple basic blocks. Yeah - it did accumulate quite some range test optimizations but I think it doesn't cover multiple BBs. Richard. > -- > Alexandre Oliva, happy hacker > https://FSFLA.org/blogs/lxo/ > Free Software Activist > GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Sep 28, 2020, Richard Biener wrote: > On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva wrote: >> This patch introduces various improvements to the logic that merges >> field compares. > Sorry for throwing a wrench in here but doing this kind of transform > during GENERIC folding is considered a bad thing. Ugh. Even tree-ifcombine itself relies on tree folding logic to perform this and various similar sorts of optimizations. Is the issue just that we don't want to perform this kind of transformation while still in GENERIC, or that the logic must not even be there? I ask because it wouldn't be too hard to disable unwanted cases of folding while in GENERIC, and extending it to follow SSA defs so that ifcombine would those optimizations back at an acceptable spot. Please help me understand what it is that makes this change as it is undesirable, so that I can figure out how to do it in an acceptable way, and justify the additional time and effort it will take. I *think* ifcombine could even be extended so as to reuse the separate-test logic I put in, by looking for non-immediate dominating outer conditions for the inner condition. A further modified version of fold_truth_andor_1 could then be used to combine the separate tests. As for reassoc... It doesn't seem a good fit at all for reassociating short-circuiting boolean operations, that normally get translated as multiple basic blocks. -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer
Re: [PATCH] assorted improvements for fold_truth_andor_1
On Fri, Sep 25, 2020 at 3:39 PM Alexandre Oliva wrote: > > > This patch introduces various improvements to the logic that merges > field compares. > > Before the patch, we could merge: > > (a.x1 EQNE b.x1) ANDOR (a.y1 EQNE b.y1) > > into something like: > > (((type *))[Na] & MASK) EQNE (((type *))[Nb] & MASK) > > if both of A's fields live within the same alignment boundaries, and > so do B's, at the same relative positions. Constants may be used > instead of the object B. > > The initial goal of this patch was to enable such combinations when a > field crossed alignment boundaries, e.g. for packed types. We can't > generally access such fields with a single memory access, so when we > come across such a compare, we will attempt to combine each access > separately. > > Some merging opportunities were missed because of right-shifts, > compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and > narrowing conversions, especially after earlier merges. This patch > introduces handlers for several cases involving these. > > Other merging opportunities were missed because of association. The > existing logic would only succeed in merging a pair of consecutive > compares, or e.g. B with C in (A ANDOR B) ANDOR C, not even trying > e.g. C and D in (A ANDOR (B ANDOR C)) ANDOR D. I've generalized the > handling of the rightmost compare in the left-hand operand, going for > the leftmost compare in the right-hand operand, and then onto trying > to merge compares pairwise, one from each operand, even if they are > not consecutive, taking care to avoid merging operations with > intervening side effects, including volatile accesses. > > When it is the second of a non-consecutive pair of compares that first > accesses a word, we may merge the first compare with part of the > second compare that refers to the same word, keeping the compare of > the remaining bits at the spot where the second compare used to be. > > Handling compares with non-constant fields was somewhat generalized, > now handling non-adjacent fields. When a field of one object crosses > an alignment boundary but the other doesn't, we issue the same load in > both compares; gimple optimizers will later turn it into a single > load, without our having to handle SAVE_EXPRs at this point. > > The logic for issuing split loads and compares, and ordering them, is > now shared between all cases of compares with constants and with > another object. > > The -Wno-error for toplev.o on rs6000 is because of toplev.c's: > > if ((flag_sanitize & SANITIZE_ADDRESS) > && !FRAME_GROWS_DOWNWARD) > > and rs6000.h's: > > #define FRAME_GROWS_DOWNWARD (flag_stack_protect != 0 \ > || (flag_sanitize & SANITIZE_ADDRESS) != 0) > > The mutually exclusive conditions involving flag_sanitize are now > noticed and reported by fold-const.c's: > > warning (0, >"% of mutually exclusive equal-tests" >" is always 0"); > > This patch enables over 12k compare-merging opportunities that we used > to miss in a GCC bootstrap. > > Regstrapped on x86_64-linux-gnu and ppc64-linux-gnu. Ok to install? Sorry for throwing a wrench in here but doing this kind of transform during GENERIC folding is considered a bad thing. There are better places during GIMPLE opts to replicate (and enhance) what fold_truth_andor does, namely ifcombine and reassoc, to mention the two passes that do similar transforms (they are by no means exact matches, otherwise fold_truth_andor would be gone already). Richard. > > for gcc/ChangeLog > > * fold-const.c (prepare_xor): New. > (decode_field_reference): Handle xor, shift, and narrowing > conversions. > (all_ones_mask_p): Remove. > (compute_split_boundary_from_align): New. > (build_split_load, reuse_split_load): New. > (fold_truth_andor_1): Add recursion to combine pairs of > non-neighboring compares. Handle xor compared with zero. > Handle fields straddling across alignment boundaries. > Generalize handling of non-constant rhs. > (fold_truth_andor): Leave sub-expression handling to the > recursion above. > * config/rs6000/t-rs6000 (toplev.o-warn): Disable errors. > > for gcc/testsuite/ChangeLog > > * gcc.dg/field-merge-1.c: New. > * gcc.dg/field-merge-2.c: New. > * gcc.dg/field-merge-3.c: New. > * gcc.dg/field-merge-4.c: New. > * gcc.dg/field-merge-5.c: New. > --- > gcc/config/rs6000/t-rs6000 |4 > gcc/fold-const.c | 818 > -- > gcc/testsuite/gcc.dg/field-merge-1.c | 64 +++ > gcc/testsuite/gcc.dg/field-merge-2.c | 31 + > gcc/testsuite/gcc.dg/field-merge-3.c | 36 + > gcc/testsuite/gcc.dg/field-merge-4.c | 40 ++ > gcc/testsuite/gcc.dg/field-merge-5.c | 40 ++ > 7 files changed, 882 insertions(+), 151 deletions(-) >