[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #29 from Richard Biener --- Created attachment 40822 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40822=edit patch doing this in SCCVN This instead of in DOM implements it in SCCVN which makes the saturated ops recognized in phiopt1. The other testcases in this bug involve other peculiarities like having & 255 instead of a conversion to match or having A + B + CST. This shows that this kind of matching is really a hack... (or well, it show for another time that handling CSE and larger expressions that are associatable is difficult).
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 --- Comment #28 from Jeffrey A. Law --- WRT c#26. Yes, I would agree that finding CSE's that way is rather gross, but significantly less so than what will be required to address this problem in phi-opt. Pattern matching this is going to be significantly more complex than the ADD_OVERFLOW/SUB_OVERFLOW. I've looked at that code via pr79095 and catching these saturating idioms is significantly more complex. I prototyped the idea of having DOM do extra lookups for a widened version of X OP Y. It's a bit ugly relative to the match.pd approach, but possibly manageable. However, doing that in DOM exposes some lameness we'd have to address in VRP. Prior to DOM2 we have: ; basic block 4, loop depth 0, count 0, freq 4665, maybe hot ;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) ;;pred: 3 [67.6%] (TRUE_VALUE,EXECUTABLE) _6 = (unsigned char) val_12(D); _7 = _3 + _6; iftmp.1_13 = (int) _7; We transform that into: ;; basic block 4, loop depth 0, count 0, freq 4665, maybe hot ;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED) ;;pred: 3 [67.6%] (TRUE_VALUE,EXECUTABLE) iftmp.1_13 = _15 & 255 That's value preserving and clearly an improvement. Unfortunately we have to wait until vrp2 to discover that the masking is redundant and simplify the statement into: iftmp.1_13 = _15 The problem is we do not propagate that copy into the PHI node at BB4's successor. By the time we do finally propagate away the copy, there aren't any additional phi-opt passes to turn things into a MIN/MAX. THe lack of copy propagation when VRP simplifies a statement like that is due to using op_with_constant_singleton_value_range as the callback for substitute_and_fold. op_with_constant_singleton_value_range only returns exactly what you would think -- constant singleton ranges. Thus we don't discover or exploit copy propagation opportunities created by VRP's statement simplification. Enhancing the callback to return an SSA_NAME for cases were VRP simplifies arithmetic/logicals to copies allows the propagation step to propagate the copy into the PHI node in BB4's successor. That in turn allows phi-opt to do its job and by the .optimized dump we have: ;; basic block 2, loop depth 0, count 0, freq 1, maybe hot ;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED) ;;pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) _1 = (sizetype) i_9(D); _2 = tmp_10(D) + _1; _3 = *_2; _4 = (int) _3; _5 = _4 + val_12(D); _16 = MAX_EXPR <_5, 0>; iftmp.0_7 = MIN_EXPR <_16, 255>; return iftmp.0_7; ;;succ: EXIT [100.0%] Which is what we want at this stage. Transforming something like that into saturating arithmetic for processors which support such insns is much easier (but IMHO out of the scope of this BZ). Anyway, I'm offline the next few days and largely booked on non-technical stuff much of March. I don't know if I'll be able to push this further over the next few weeks or not.
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 --- Comment #27 from Richard Biener --- patch misses some ops = {} and ops[1] = NULL_TREE; to avoid ICEing.
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 --- Comment #26 from rguenther at suse dot de --- On Fri, 17 Feb 2017, law at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 > > --- Comment #25 from Jeffrey A. Law --- > "When doing so allows for simplification" is actually a pretty low bar here. > We just have to prove the widening cast is a common subexpression. Once we do > that, we know widening is a win. And that's relatively easy to discover. You > go back to the SSA_NAME_DEF_STMT of the object you want to widen, then walk > over its immediate uses to see if one is a proper widening op that dominates > the expression we're trying to simplify. > > That's it, nothing else needed to prove profitability. We only get ~20 hits > in > a bootstrap, but I didn't expect this idiom to show up all that much. I find this kind of "technique" of finding CSE opportunities quite ugly and I'm not sure where you'd actually implement this... > > -- > > Trying to do all the pattern matching in phi-opt and ultimately generate the > min/max is going to be *hideous* because we're trying to do too many things > at > once. We have components of CSE, VRP and DCE that we'd have to bring into > phi-opt, which seems wrong to me. Well, we are pattern matching the overflow builtins as well. I think it would be quite natural to pattern match saturating operations with the premise to generate better code for them (either by generic expansion to min/max or by utilizing CPU instructions -- I think SSE has saturating ops for example). You don't really need CSE, VRP and DCE, you "simply" pattern match the comparisons. You'd then replace the PHI node by the saturated op and let DCE do the rest for you. > Contrast that to simplifying the IL like my match.pd pattern does. We make a > simple, profitable, change to the IL. That one profitable change in the IL > has > ripple effects that allow subsequent passes to do their jobs with the final > result that the minmax detection code is presented with exactly the IL it > knows > how to transform. > > -- > > Alternately (and I haven't prototyped this) we could try again to look at this > as a redundancy elimination problem. > > Given X = A + B in DOM, where A comes from a narrowed operand (A'). Lookup a > widened B in the expression table (B'). If found, then lookup A' + B'. If > found (lhs), then replace X = A + B with X = (typeof X) ((lhs) & mask). > > That's essentially doing the same thing as the prototype match.pd pattern. > Except that we know the operand widening as well as the widened arithmetic are > both CSEs. CSE already does similar stuff for loads, transparently adding VIEW_CONVERT_EXPRs. I've added generic helpers for this (vn_nary_build_or_lookup). The question is if it's worth it. You'd basically add to the list of special patterns in visit_nary_op where we already handle ops in different sign (you'd add "op in different width" and possibly add two stmts instead of one - the conversion and the masking). Oh, happens to be a patch in my local tree only (no longer remember what was the reason to invent it but surely the question is how many of these special-cases are worth the extra lookups) Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 245417) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -3453,17 +3493,78 @@ visit_copy (tree lhs, tree rhs) static bool visit_nary_op (tree lhs, gimple *stmt) { - bool changed = false; tree result = vn_nary_op_lookup_stmt (stmt, NULL); - if (result) -changed = set_ssa_val_to (lhs, result); - else -{ - changed = set_ssa_val_to (lhs, lhs); - vn_nary_op_insert_stmt (stmt, lhs); +return set_ssa_val_to (lhs, result); + else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) + && (TYPE_PRECISION (TREE_TYPE (lhs)) + == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (lhs + && is_gimple_assign (stmt)) +{ + /* For operations on integers that have the same result independent + of the signedness of their operands try to lookup a result with +opposite sign. */ + enum tree_code code = gimple_assign_rhs_code (stmt); + switch (code) + { + case NEGATE_EXPR: + { + tree type = TREE_TYPE (lhs); + type = signed_or_unsigned_type_for (! TYPE_UNSIGNED (type), type); + tree ops[3]; + ops[0] = gimple_assign_rhs1 (stmt); + ops[0] = vn_nary_op_lookup_pieces (1, NOP_EXPR, type, ops, NULL); + if (ops[0]) + { + ops[0] = vn_nary_op_lookup_pieces (1, code, type, + ops, NULL); + if (ops[0]) + { + result = vn_nary_build_or_lookup (NOP_EXPR, + TREE_TYPE (lhs), ops); +
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 --- Comment #25 from Jeffrey A. Law --- "When doing so allows for simplification" is actually a pretty low bar here. We just have to prove the widening cast is a common subexpression. Once we do that, we know widening is a win. And that's relatively easy to discover. You go back to the SSA_NAME_DEF_STMT of the object you want to widen, then walk over its immediate uses to see if one is a proper widening op that dominates the expression we're trying to simplify. That's it, nothing else needed to prove profitability. We only get ~20 hits in a bootstrap, but I didn't expect this idiom to show up all that much. -- Trying to do all the pattern matching in phi-opt and ultimately generate the min/max is going to be *hideous* because we're trying to do too many things at once. We have components of CSE, VRP and DCE that we'd have to bring into phi-opt, which seems wrong to me. Contrast that to simplifying the IL like my match.pd pattern does. We make a simple, profitable, change to the IL. That one profitable change in the IL has ripple effects that allow subsequent passes to do their jobs with the final result that the minmax detection code is presented with exactly the IL it knows how to transform. -- Alternately (and I haven't prototyped this) we could try again to look at this as a redundancy elimination problem. Given X = A + B in DOM, where A comes from a narrowed operand (A'). Lookup a widened B in the expression table (B'). If found, then lookup A' + B'. If found (lhs), then replace X = A + B with X = (typeof X) ((lhs) & mask). That's essentially doing the same thing as the prototype match.pd pattern. Except that we know the operand widening as well as the widened arithmetic are both CSEs.
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 --- Comment #24 from Richard Biener --- (In reply to Jeffrey A. Law from comment #23) > The model of shortening as much as possible for gimple, then widening to > word mode at the gimple/RTL boundary is probably too simplistic. We really > need the ability to widen when doing so allows for simplification. But the key point here is "when doing so allows for simplification" ... > That's what the prototype pattern I posted was meant to show. By widening > we can expose the common subexpressions, and once the common subexpressions > are exposed, min/max can do its job. The problem is we don't have range > data, so we have to do masking of the result, nor do we know if the widened > operand is a common subexpression or not. Thus profitability is unknown at > transformation time. > > I was about to write off VRP again because it lacks much of the CSE > capability we want, but VRP has the information to know that the result is > value preserving if done in the wider mode. So at VRP time we could do the > widening without needing to mask the result, at which point widening is > known to be profitable. As written we have two casts + addition. VRP could > turn that into a addition with a single cast (both of which are common > subexpressions, but VRP doesn't need to detect that to ensure profitability). > > DOM/FRE have CSE infrastructure, but the range data presented to them is not > path sensitive and thus useless in this case. Note that mid-term I'd like our value-numbering infrastructure (VRP is also a kind of value-numbering) to improve to a point where we simply do VRP alongside CSE, constant, copy and known-zero/one-bits propagation. You "simply" track multiple lattices (configurable) from a common iteration (configurable as either non-iterating, pessimistically handling backedges, or iterating, optimistically handling them). > I'm hesitant to try to do all this in phi-opt -- phi-opt would have to look > through the casts, prove equivalences, etc. ick. Yeah, ICK. But still this case is very much about pattern matching (and getting min/max in the end). It _is_ a saturating operation which might be common enough to pattern-match (heh, some targets even have instruction support for them and the middle-end has saturating types!) So short-term pattern matching in phiopt is the way to go (I'd consider that even as a regression fix).
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 --- Comment #23 from Jeffrey A. Law --- The model of shortening as much as possible for gimple, then widening to word mode at the gimple/RTL boundary is probably too simplistic. We really need the ability to widen when doing so allows for simplification. That's what the prototype pattern I posted was meant to show. By widening we can expose the common subexpressions, and once the common subexpressions are exposed, min/max can do its job. The problem is we don't have range data, so we have to do masking of the result, nor do we know if the widened operand is a common subexpression or not. Thus profitability is unknown at transformation time. I was about to write off VRP again because it lacks much of the CSE capability we want, but VRP has the information to know that the result is value preserving if done in the wider mode. So at VRP time we could do the widening without needing to mask the result, at which point widening is known to be profitable. As written we have two casts + addition. VRP could turn that into a addition with a single cast (both of which are common subexpressions, but VRP doesn't need to detect that to ensure profitability). DOM/FRE have CSE infrastructure, but the range data presented to them is not path sensitive and thus useless in this case. I'm hesitant to try to do all this in phi-opt -- phi-opt would have to look through the casts, prove equivalences, etc. ick.
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 Richard Biener changed: What|Removed |Added Target Milestone|5.5 |7.0 --- Comment #22 from Richard Biener --- I think the pattern to identify is the *tmp + val saturating addition. One of the issues is that we have tmp[i] + val in two forms: _5 = (int)tmp[i] + val; and _iftmp.1_13 = (int)(tmp[i] + (unsigned char)val); where the latter is guarded with an overflow check (_5 <= 255 && _5 >= 0). IMHO the unfortunate thing is that we arrive with this inconsistent narrowing here in the first place before we had a chance to clean things up. Of course as Jakub notices in comment#1 the user could have written this in this way. And _only_ if the no-overflow condition holds are the two expressions the same (contrary to Jakubs example that compares both values as unsigned char). As suggested above we may want to do "saturated OP" pattern matching here and then re-expand in more optimal form (using min/max as suggested or using an IFN like we have for the overflow cases). Another option is to teach VRP to detect the equivalence (but that looks hard and even more ad-hoc). As for using CSE in some way it would need to see the range of _5 and conditionally add (int)(unsigned char)expr-of-_5 simplified and hope things get detected that way... (but first of all it would need a value-range lattice). So the only "reasonable" suggestion is pattern matching the whole thing which fits phiopt. I do not think that having a 'widening' general pattern is the way to go. That has the same issues as the current narrowing one. Moving down the narrowing optimization from the FEs folding code is a step in the right direction anyway (but it would need putting elsewhere to not "regress"). I think that all but maybe a saturating-OP replacement in phiopt is GCC 8 material. Adjusting target milestone to GCC 7 meanwhile.
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 Jeffrey A. Law changed: What|Removed |Added CC||law at redhat dot com --- Comment #21 from Jeffrey A. Law --- Couldn't we solve this with a pattern like this in match.pd: (simplify (convert (plus:c @0 (convert @1))) (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) && INTEGRAL_TYPE_P (TREE_TYPE (@0)) && INTEGRAL_TYPE_P (type) && types_match (@1, type) && (TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0 (bit_and (plus (convert @0) @1) { wide_int_to_tree (type, wi::mask (TYPE_PRECISION (TREE_TYPE (@0)), false, TYPE_PRECISION (type))); }))) Essentially this widens the arithmetic, then masks off undesirable bits in the result. As written I think this doesn't reduce the number of expressions consistently even though it helps the testcase. The reason it helps the testcase is the (convert @0) in the output pattern turns out to be a common subexpression. If (convert @0) is CSE-able, then this does reduce the number of expressions. Though we probably want to avoid widening beyond a target's wordsize (which I loathe to test for). Thoughts anyone?
[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397 Richard Biener changed: What|Removed |Added Target Milestone|4.9.4 |5.5 --- Comment #20 from Richard Biener --- GCC 4.9 branch is being closed