[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-24 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #29 from Richard Biener  ---
Created attachment 40822
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40822=edit
patch doing this in SCCVN

This instead of in DOM implements it in SCCVN which makes the saturated ops
recognized in phiopt1.

The other testcases in this bug involve other peculiarities like having
& 255 instead of a conversion to match or having A + B + CST.  This shows
that this kind of matching is really a hack... (or well, it show for another
time that handling CSE and larger expressions that are associatable is
difficult).

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-23 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #28 from Jeffrey A. Law  ---
WRT c#26.  

Yes, I would agree that finding CSE's that way is rather gross, but
significantly less so than what will be required to address this problem in
phi-opt.

Pattern matching this is going to be significantly more complex than the
ADD_OVERFLOW/SUB_OVERFLOW.  I've looked at that code via pr79095 and catching
these saturating idioms is significantly more complex.

I prototyped the idea of having DOM do extra lookups for a widened version of X
OP Y.  It's a bit ugly relative to the match.pd approach, but possibly
manageable.

However, doing that in DOM exposes some lameness we'd have to address in VRP.

Prior to DOM2 we have:

;   basic block 4, loop depth 0, count 0, freq 4665, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [67.6%]  (TRUE_VALUE,EXECUTABLE)
  _6 = (unsigned char) val_12(D);
  _7 = _3 + _6;
  iftmp.1_13 = (int) _7;

We transform that into:

;;   basic block 4, loop depth 0, count 0, freq 4665, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [67.6%]  (TRUE_VALUE,EXECUTABLE)
  iftmp.1_13 = _15 & 255

That's value preserving and clearly an improvement.  Unfortunately we have to
wait until vrp2 to discover that the masking is redundant and simplify the
statement into:


iftmp.1_13 = _15

The problem is we do not propagate that copy into the PHI node at BB4's
successor.  By the time we do finally propagate away the copy, there aren't any
additional phi-opt passes to turn things into a MIN/MAX.

THe lack of copy propagation when VRP simplifies a statement like that is due
to using op_with_constant_singleton_value_range as the callback for
substitute_and_fold.  op_with_constant_singleton_value_range only returns
exactly what you would think -- constant singleton ranges.  Thus we don't
discover or exploit copy propagation opportunities created by VRP's statement
simplification.

Enhancing the callback to return an SSA_NAME for cases were VRP simplifies
arithmetic/logicals to copies allows the propagation step to propagate the copy
into the PHI node in BB4's successor.

That in turn allows phi-opt to do its job and by the .optimized dump we have:

;;   basic block 2, loop depth 0, count 0, freq 1, maybe hot
;;prev block 0, next block 1, flags: (NEW, REACHABLE, VISITED)
;;pred:   ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  _1 = (sizetype) i_9(D);
  _2 = tmp_10(D) + _1;
  _3 = *_2;
  _4 = (int) _3;
  _5 = _4 + val_12(D);
  _16 = MAX_EXPR <_5, 0>;
  iftmp.0_7 = MIN_EXPR <_16, 255>;
  return iftmp.0_7;
;;succ:   EXIT [100.0%]


Which is what we want at this stage.  Transforming something like that into
saturating arithmetic for processors which support such insns is much easier
(but IMHO out of the scope of this BZ).

Anyway, I'm offline the next few days and largely booked on non-technical stuff
much of March.  I don't know if I'll be able to push this further over the next
few weeks or not.

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #27 from Richard Biener  ---
patch misses some ops = {} and ops[1] = NULL_TREE; to avoid ICEing.

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-20 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #26 from rguenther at suse dot de  ---
On Fri, 17 Feb 2017, law at redhat dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397
> 
> --- Comment #25 from Jeffrey A. Law  ---
> "When doing so allows for simplification" is actually a pretty low bar here. 
> We just have to prove the widening cast is a common subexpression.  Once we do
> that, we know widening is a win.  And that's relatively easy to discover.  You
> go back to the SSA_NAME_DEF_STMT of the object you want to widen, then walk
> over its immediate uses to see if one is a proper widening op that dominates
> the expression we're trying to simplify. 
> 
> That's it, nothing else needed to prove profitability.  We only get ~20 hits 
> in
> a bootstrap, but I didn't expect this idiom to show up all that much.

I find this kind of "technique" of finding CSE opportunities quite ugly
and I'm not sure where you'd actually implement this...

> 
> --
> 
> Trying to do all the pattern matching in phi-opt and ultimately generate the
> min/max is going to be *hideous* because we're trying to do too  many things 
> at
> once.  We have components of CSE, VRP and DCE that we'd have to bring into
> phi-opt, which seems wrong to me.

Well, we are pattern matching the overflow builtins as well.  I think
it would be quite natural to pattern match saturating operations with
the premise to generate better code for them (either by generic expansion
to min/max or by utilizing CPU instructions -- I think SSE has saturating
ops for example).  You don't really need CSE, VRP and DCE, you "simply"
pattern match the comparisons.  You'd then replace the PHI node by
the saturated op and let DCE do the rest for you.

> Contrast that to simplifying the IL like my match.pd pattern does.   We make a
> simple, profitable, change to the IL.  That one profitable change in the IL 
> has
> ripple effects that allow subsequent passes to do their jobs with the final
> result that the minmax detection code is presented with exactly the IL it 
> knows
> how to transform.
> 
> --
> 
> Alternately (and I haven't prototyped this) we could try again to look at this
> as a redundancy elimination problem.
> 
> Given X = A + B in DOM, where A comes from a narrowed operand (A').  Lookup a
> widened B in the expression table (B').  If found, then lookup A' + B'.  If
> found (lhs), then replace X = A + B with X = (typeof X) ((lhs) & mask).
> 
> That's essentially doing the same thing as the prototype match.pd pattern. 
> Except that we know the operand widening as well as the widened arithmetic are
> both CSEs.

CSE already does similar stuff for loads, transparently adding 
VIEW_CONVERT_EXPRs.  I've added generic helpers for this
(vn_nary_build_or_lookup).  The question is if it's worth it.
You'd basically add to the list of special patterns in visit_nary_op
where we already handle ops in different sign (you'd add "op in different
width" and possibly add two stmts instead of one - the conversion and
the masking).

Oh, happens to be a patch in my local tree only (no longer remember
what was the reason to invent it but surely the question is how
many of these special-cases are worth the extra lookups)

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 245417)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -3453,17 +3493,78 @@ visit_copy (tree lhs, tree rhs)
 static bool
 visit_nary_op (tree lhs, gimple *stmt)
 {
-  bool changed = false;
   tree result = vn_nary_op_lookup_stmt (stmt, NULL);
-
   if (result)
-changed = set_ssa_val_to (lhs, result);
-  else
-{
-  changed = set_ssa_val_to (lhs, lhs);
-  vn_nary_op_insert_stmt (stmt, lhs);
+return set_ssa_val_to (lhs, result);
+  else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+  && (TYPE_PRECISION (TREE_TYPE (lhs))
+  == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (lhs
+  && is_gimple_assign (stmt))
+{
+  /* For operations on integers that have the same result independent
+ of the signedness of their operands try to lookup a result with
+opposite sign.  */
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  switch (code)
+   {
+   case NEGATE_EXPR:
+ {
+   tree type = TREE_TYPE (lhs);
+   type = signed_or_unsigned_type_for (! TYPE_UNSIGNED (type), 
type);
+   tree ops[3];
+   ops[0] = gimple_assign_rhs1 (stmt);
+   ops[0] = vn_nary_op_lookup_pieces (1, NOP_EXPR, type, ops, 
NULL);
+   if (ops[0])
+ {
+   ops[0] = vn_nary_op_lookup_pieces (1, code, type,
+  ops, NULL);
+   if (ops[0])
+ {
+   result = vn_nary_build_or_lookup (NOP_EXPR,
+ TREE_TYPE (lhs), 
ops);
+ 

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-17 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #25 from Jeffrey A. Law  ---
"When doing so allows for simplification" is actually a pretty low bar here. 
We just have to prove the widening cast is a common subexpression.  Once we do
that, we know widening is a win.  And that's relatively easy to discover.  You
go back to the SSA_NAME_DEF_STMT of the object you want to widen, then walk
over its immediate uses to see if one is a proper widening op that dominates
the expression we're trying to simplify. 

That's it, nothing else needed to prove profitability.  We only get ~20 hits in
a bootstrap, but I didn't expect this idiom to show up all that much.


--

Trying to do all the pattern matching in phi-opt and ultimately generate the
min/max is going to be *hideous* because we're trying to do too  many things at
once.  We have components of CSE, VRP and DCE that we'd have to bring into
phi-opt, which seems wrong to me.

Contrast that to simplifying the IL like my match.pd pattern does.   We make a
simple, profitable, change to the IL.  That one profitable change in the IL has
ripple effects that allow subsequent passes to do their jobs with the final
result that the minmax detection code is presented with exactly the IL it knows
how to transform.

--

Alternately (and I haven't prototyped this) we could try again to look at this
as a redundancy elimination problem.

Given X = A + B in DOM, where A comes from a narrowed operand (A').  Lookup a
widened B in the expression table (B').  If found, then lookup A' + B'.  If
found (lhs), then replace X = A + B with X = (typeof X) ((lhs) & mask).

That's essentially doing the same thing as the prototype match.pd pattern. 
Except that we know the operand widening as well as the widened arithmetic are
both CSEs.

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #24 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #23)
> The model of shortening as much as possible for gimple, then widening to
> word mode at the gimple/RTL boundary is probably too simplistic.  We really
> need the ability to widen when doing so allows for simplification.

But the key point here is "when doing so allows for simplification" ...

> That's what the prototype pattern I posted was meant to show.  By widening
> we can expose the common subexpressions, and once the common subexpressions
> are exposed, min/max can do its job.  The problem is we don't have range
> data, so we have to do masking of the result, nor do we know if the widened
> operand is a common subexpression or not.  Thus profitability is unknown at
> transformation time.
> 
> I was about to write off VRP again because it lacks much of the CSE
> capability we want, but VRP has the information to know that the result is
> value preserving if done in the wider mode.  So at VRP time we could do the
> widening without needing to mask the result, at which point widening is
> known to be profitable.  As written we have two casts + addition.  VRP could
> turn that into a addition with a single cast (both of which are common
> subexpressions, but VRP doesn't need to detect that to ensure profitability).
> 
> DOM/FRE have CSE infrastructure, but the range data presented to them is not
> path sensitive and thus useless in this case.

Note that mid-term I'd like our value-numbering infrastructure (VRP is also
a kind of value-numbering) to improve to a point where we simply do VRP
alongside CSE, constant, copy and known-zero/one-bits propagation.  You
"simply" track multiple lattices (configurable) from a common iteration
(configurable as either non-iterating, pessimistically handling backedges,
or iterating, optimistically handling them).

> I'm hesitant to try to do all this in phi-opt -- phi-opt would have to look
> through the casts, prove equivalences, etc.  ick.

Yeah, ICK.  But still this case is very much about pattern matching
(and getting min/max in the end).  It _is_ a saturating operation which
might be common enough to pattern-match (heh, some targets even have
instruction support for them and the middle-end has saturating types!)

So short-term pattern matching in phiopt is the way to go (I'd consider
that even as a regression fix).

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-16 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

--- Comment #23 from Jeffrey A. Law  ---
The model of shortening as much as possible for gimple, then widening to word
mode at the gimple/RTL boundary is probably too simplistic.  We really need the
ability to widen when doing so allows for simplification.

That's what the prototype pattern I posted was meant to show.  By widening we
can expose the common subexpressions, and once the common subexpressions are
exposed, min/max can do its job.  The problem is we don't have range data, so
we have to do masking of the result, nor do we know if the widened operand is a
common subexpression or not.  Thus profitability is unknown at transformation
time.

I was about to write off VRP again because it lacks much of the CSE capability
we want, but VRP has the information to know that the result is value
preserving if done in the wider mode.  So at VRP time we could do the widening
without needing to mask the result, at which point widening is known to be
profitable.  As written we have two casts + addition.  VRP could turn that into
a addition with a single cast (both of which are common subexpressions, but VRP
doesn't need to detect that to ensure profitability).

DOM/FRE have CSE infrastructure, but the range data presented to them is not
path sensitive and thus useless in this case.

I'm hesitant to try to do all this in phi-opt -- phi-opt would have to look
through the casts, prove equivalences, etc.  ick.

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|5.5 |7.0

--- Comment #22 from Richard Biener  ---
I think the pattern to identify is the *tmp + val saturating addition.  One of
the issues is that we have tmp[i] + val in two forms:

  _5 = (int)tmp[i] + val;

and

  _iftmp.1_13 = (int)(tmp[i] + (unsigned char)val);

where the latter is guarded with an overflow check (_5 <= 255 && _5 >= 0).

IMHO the unfortunate thing is that we arrive with this inconsistent
narrowing here in the first place before we had a chance to clean things up.
Of course as Jakub notices in comment#1 the user could have written this
in this way.  And _only_ if the no-overflow condition holds are the two
expressions the same (contrary to Jakubs example that compares both values
as unsigned char).

As suggested above we may want to do "saturated OP" pattern matching here
and then re-expand in more optimal form (using min/max as suggested or
using an IFN like we have for the overflow cases).

Another option is to teach VRP to detect the equivalence (but that looks hard
and even more ad-hoc).  As for using CSE in some way it would need to see
the range of _5 and conditionally add (int)(unsigned char)expr-of-_5 simplified
and hope things get detected that way... (but first of all it would need a
value-range lattice).

So the only "reasonable" suggestion is pattern matching the whole thing
which fits phiopt.

I do not think that having a 'widening' general pattern is the way to go.
That has the same issues as the current narrowing one.

Moving down the narrowing optimization from the FEs folding code is a step
in the right direction anyway (but it would need putting elsewhere to not
"regress").

I think that all but maybe a saturating-OP replacement in phiopt is GCC 8
material.  Adjusting target milestone to GCC 7 meanwhile.

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2017-02-14 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #21 from Jeffrey A. Law  ---
Couldn't we solve this with a pattern like this in match.pd:

(simplify
  (convert (plus:c @0 (convert @1)))
  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && INTEGRAL_TYPE_P (type)
   && types_match (@1, type)
   && (TYPE_PRECISION (TREE_TYPE (@1)) > TYPE_PRECISION (TREE_TYPE (@0
(bit_and (plus (convert @0) @1)
   { wide_int_to_tree
(type, wi::mask (TYPE_PRECISION (TREE_TYPE (@0)), false,
TYPE_PRECISION (type))); })))

Essentially this widens the arithmetic, then masks off undesirable bits in the
result.  As written I think this doesn't reduce the number of expressions
consistently even though it helps the testcase.

The reason it helps the testcase is the (convert @0) in the output pattern
turns out to be a common subexpression.  If (convert @0) is CSE-able, then this
does reduce the number of expressions.  Though we probably want to avoid
widening beyond a target's wordsize (which I loathe to test for).

Thoughts anyone?

[Bug tree-optimization/45397] [5/6/7 Regression] Issues with integer narrowing conversions

2016-08-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45397

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|4.9.4   |5.5

--- Comment #20 from Richard Biener  ---
GCC 4.9 branch is being closed