[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-06 Thread gergo.barany at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #1 from Gergö Barany  ---
Sorry, forgot to add the command line. I use gcc -O3 on all platforms

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-06 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #2 from Andrew Pinski  ---
Most likely the optimization is in fold-const.c and has not been moved to
match.pd yet.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-07-07
 CC||glisse at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Jakub Jelinek  ---
Indeed, it is fold_div_compare in fold-const.c.  Perhaps we should move the
first part of that routine (the computation of lo and hi) into a helper
subroutine and use that in match.pd pattern that would then based on those
lo/hi trees decide how to optimize stuff?

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Working on a patch.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #5 from Jakub Jelinek  ---
Created attachment 41707
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41707&action=edit
gcc8-pr81346-wip.patch

Untested WIP patch.  Still no idea how to handle the build_range_check stuff at
GIMPLE, that function is pretty huge.  Shall I just always generate the cast to
utype and (acmp (minus (convert:utype @0) { lo; }) { himinuslo; })
after doing the etype/utype computation and verification in (with {...}) and
hope rest of match.pd optimizes that or add matchers for the various
optimizations in there?

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #6 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #5)
> Untested WIP patch.  Still no idea how to handle the build_range_check stuff
> at GIMPLE, that function is pretty huge.  Shall I just always generate the
> cast to utype and (acmp (minus (convert:utype @0) { lo; }) { himinuslo; })
> after doing the etype/utype computation and verification in (with {...}) and
> hope rest of match.pd optimizes that or add matchers for the various
> optimizations in there?

It seems reasonable to me to always emit something of the form (le (minus
(convert:utype @0) cst) cst) here (or maybe even a bit_and of comparisons with
the extremities as an intermediate step), and add a separate (rather long)
transformation that recognizes special cases of this pattern like (x-42u)<=-43u
(aka x>=42u). I am slightly concerned about unsafe things happening to
pointers, but I probably shouldn't be.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #41707|0   |1
is obsolete||

--- Comment #7 from Jakub Jelinek  ---
Created attachment 41760
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41760&action=edit
gcc8-pr81346-wip.patch

Updated patch that does this, by moving the code from build_range_check into a
helper function that can be used in two spots it is actually small.

The remaining tasks are testsuite coverage and then match.pd optimizations for
the build_range_check stuff where needed.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #8 from Marc Glisse  ---
I think always using an unsigned type for the range check would be simpler. If
we try to check that x>=INT_MIN+2 && x<=INT_MAX-2 with -fwrapv, int is still
not a suitable type in which to do x-(INT_MIN+2)<=INT_MAX-2-(INT_MIN+2), while
the issue doesn't exist with an unsigned type.
I notice you call build_range_check in GENERIC (and new code for GIMPLE). Is
that temporary until match.pd can optimize range checks?
Do we want :s on trunc_div?

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-14 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #9 from Jakub Jelinek  ---
(In reply to Marc Glisse from comment #8)
> I think always using an unsigned type for the range check would be simpler.
> If we try to check that x>=INT_MIN+2 && x<=INT_MAX-2 with -fwrapv, int is
> still not a suitable type in which to do
> x-(INT_MIN+2)<=INT_MAX-2-(INT_MIN+2), while the issue doesn't exist with an
> unsigned type.

I'm trying to preserve what we did before, it can be tweaked incrementally if
needed.

> I notice you call build_range_check in GENERIC (and new code for GIMPLE). Is
> that temporary until match.pd can optimize range checks?

As long as we keep build_range_check, which is used in multiple other places in
fold-const.c as well as the reassoc pass, using that for GENERIC looks
faster/simpler than emitting the comparison and then optimizing it.

> Do we want :s on trunc_div?

Yes, will do that.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #10 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #9)
> (In reply to Marc Glisse from comment #8)
> > I think always using an unsigned type for the range check would be simpler.
> > If we try to check that x>=INT_MIN+2 && x<=INT_MAX-2 with -fwrapv, int is
> > still not a suitable type in which to do
> > x-(INT_MIN+2)<=INT_MAX-2-(INT_MIN+2), while the issue doesn't exist with an
> > unsigned type.
> 
> I'm trying to preserve what we did before, it can be tweaked incrementally
> if needed.

Then you may need to check for overflow in "hi = const_binop (MINUS_EXPR,
etype, hi, lo);", current build_range_check has "if (value != 0 &&
!TREE_OVERFLOW (value))" for the result of that operation. That should matter
for instance when simplifying X/INT_MAX==0.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #41760|0   |1
is obsolete||

--- Comment #11 from Jakub Jelinek  ---
Created attachment 41780
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41780&action=edit
gcc8-pr81346.patch

Updated patch.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #12 from Jakub Jelinek  ---
Created attachment 41781
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41781&action=edit
gcc8-pr81346-2.patch

Further optimization from build_range_check.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-18 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #13 from Marc Glisse  ---
(In reply to Jakub Jelinek from comment #12)
> Created attachment 41781 [details]
> gcc8-pr81346-2.patch
> 
> Further optimization from build_range_check.

I wonder if "1" is that special, this optimization basically applies to any
range that ends at INT_MAX, turning (X-C1)<=C2 into (signed)X>=C3. Or do we
consider that only the case that yields a simple sign check is a win?

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #14 from Jakub Jelinek  ---
Author: jakub
Date: Wed Jul 19 12:31:59 2017
New Revision: 250338

URL: https://gcc.gnu.org/viewcvs?rev=250338&root=gcc&view=rev
Log:
PR tree-optimization/81346
* fold-const.h (fold_div_compare, range_check_type): Declare.
* fold-const.c (range_check_type): New function.
(build_range_check): Use range_check_type.
(fold_div_compare): No longer static, rewritten into
a match.pd helper function.
(fold_comparison): Don't call fold_div_compare here.
* match.pd (X / C1 op C2): New optimization using fold_div_compare
as helper function.

* gcc.dg/tree-ssa/pr81346-1.c: New test.
* gcc.dg/tree-ssa/pr81346-2.c: New test.
* gcc.dg/tree-ssa/pr81346-3.c: New test.
* gcc.dg/tree-ssa/pr81346-4.c: New test.
* gcc.target/i386/umod-3.c: Hide comparison against 1 from the
compiler to avoid X / C1 op C2 optimization to trigger.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81346-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81346-2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81346-3.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81346-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/fold-const.c
trunk/gcc/fold-const.h
trunk/gcc/match.pd
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/umod-3.c

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #15 from Jakub Jelinek  ---
Author: jakub
Date: Wed Jul 19 13:10:05 2017
New Revision: 250342

URL: https://gcc.gnu.org/viewcvs?rev=250342&root=gcc&view=rev
Log:
PR tree-optimization/81346
* match.pd: Optimize (X - 1U) <= INT_MAX-1U into (int) X > 0.

* gcc.dg/tree-ssa/pr81346-5.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr81346-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/match.pd
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-07-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Jakub Jelinek  ---
Fixed for 8.1+.


[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-09-14 Thread gergo.barany at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #17 from Gergö Barany  ---
Thanks for fixing this. I did notice a small thing that might be considered a
tiny regression due to the fix.

If the divisor is a small power of 2, as in the following example:

int fn1(char p1) {
  long a;
  char b;
  int c = a = 4;
  b = !(p1 / a);
  if (b)
c = 0;
  return c;
}

the division used to be replaced by a shift that updated the condition code
register (again, on ARM; r250337):

lsrsr3, r0, #2
movne   r0, #4
moveq   r0, #0
bx  lr

whereas after the fix (tested on r250342) the new folding rule takes precedence
and generates one instruction more:

add r0, r0, #3
cmp r0, #6
movhi   r0, #4
movls   r0, #0
bx  lr

I guess the rule could be updated to only apply if the divisor is not a small
power of 2, or folding a division by a power of 2 into a shift could be
prioritized.

Sorry about only pointing this out two months later! Also, let me stress that I
do not have code that depends on this transformation. This came out of research
I'm doing on missed optimization, and this was one example I found interesting.

[Bug tree-optimization/81346] Missed constant propagation into comparison

2017-09-14 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81346

--- Comment #18 from Marc Glisse  ---
(In reply to Gergö Barany from comment #17)
> the division used to be replaced by a shift that updated the condition code
> register (again, on ARM; r250337):

(just my opinion)
At a high level (gimple), (unsigned)x+3<=6 seems like a more canonical way to
represent an interval than x/4==0. If the second one turns out to be more
efficient on some targets, it sounds like we could later turn (unsigned)x+3<=6
into x/4==0 (even if the user did not write it that way), i.e. add a new
transform at RTL time. Looks like a separate enhancement request would be
appropriate.