subject:"\[PATCH\] Optimize \(CST1 A\) == CST2 \(PR tree\-optimization\/66299\)"

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Richard Biener

On Mon, Jun 8, 2015 at 7:03 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 8 Jun 2015, Marek Polacek wrote:

   PR tree-optimization/66299
   * match.pd ((CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
   ((CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)): New


 You are braver than I am, I would have abbreviated ctz (CST2) - ctz (CST1)
 to CST3 in the ChangeLog ;-)

 +/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
 +   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
 +   if CST2 != 0.  */
 +(for cmp (ne eq)
 + (simplify
 +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
 +  (with {
 +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
 +  (if (!integer_zerop (@2)


 You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
 indented one space more?

Yes, one space more.  I suppose using integer_zerop might in theory
allow for handling vector shifts at some point ...?

 +wi::eq_p (wi::lshift (@0, cand), @2))
 +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })


 Making 'cand' signed, you could return 0 when cand0, like (2x)==1. You
 could also return 0 when the candidate turns out not to work: (3x)==4.

Sounds like a good improvement.

 Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS and
 false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.

Hm, yes.  I think signed overflow != shift amount overflow, so testing
the overflow
macros for this isn't valid.

Otherwise the patch looks ok to me as well - mind doing the improvement above?

Thanks,
Richard.

 FWIW, the patch looks good to me, thanks.

 --
 Marc Glisse

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Richard Biener

On Tue, 9 Jun 2015, Marc Glisse wrote:

 On Tue, 9 Jun 2015, Richard Biener wrote:
 
   Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS and
   false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
  
  Hm, yes.  I think signed overflow != shift amount overflow, so testing the
  overflow macros for this isn't valid.
 
 Would it be ok to always turn it to X=31 then? (the value 31 is conveniently
 already computed in 'cand')

I think so.

Richard.

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Richard Biener

On Tue, 9 Jun 2015, Richard Biener wrote:

 On Tue, 9 Jun 2015, Marc Glisse wrote:
 
  On Tue, 9 Jun 2015, Richard Biener wrote:
  
Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS and
false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
   
   Hm, yes.  I think signed overflow != shift amount overflow, so testing the
   overflow macros for this isn't valid.
  
  Would it be ok to always turn it to X=31 then? (the value 31 is 
  conveniently
  already computed in 'cand')
 
 I think so.

Or even ((unsigned)X - 31)  1 (I probably got that wrong) to properly
say X=29  X32, that is, preserve the implicit upper bound on X
we have because it is used in a shift.

Richard.

 Richard.
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Marc Glisse


On Tue, 9 Jun 2015, Richard Biener wrote:


Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS and
false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.


Hm, yes.  I think signed overflow != shift amount overflow, so testing 
the overflow macros for this isn't valid.


Would it be ok to always turn it to X=31 then? (the value 31 is 
conveniently already computed in 'cand')


--
Marc Glisse

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Richard Biener

On Tue, 9 Jun 2015, Marek Polacek wrote:

 On Tue, Jun 09, 2015 at 09:53:21AM +0200, Richard Biener wrote:
   +/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
   +   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
   +   if CST2 != 0.  */
   +(for cmp (ne eq)
   + (simplify
   +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
   +  (with {
   +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
   +  (if (!integer_zerop (@2)
  
  
   You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
   indented one space more?
  
  Yes, one space more.  I suppose using integer_zerop might in theory
  allow for handling vector shifts at some point ...?
  
 Fixed the formatting.  But I kept the integer_zerop.
 
   +wi::eq_p (wi::lshift (@0, cand), @2))
   +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })
  
  
   Making 'cand' signed, you could return 0 when cand0, like (2x)==1. You
   could also return 0 when the candidate turns out not to work: (3x)==4.
  
  Sounds like a good improvement.
  
 Yeah, it makes sense to do that while I'm on this.  Done, and a new
 test added.
 
  Otherwise the patch looks ok to me as well - mind doing the improvement 
  above?
 
 Thank you both.  How does this look now?
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2015-06-09  Marek Polacek  pola...@redhat.com
 
   PR tree-optimization/66299
   * match.pd ((CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
   ((CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)): New
   patterns.
 
   * gcc.dg/pr66299-1.c: New test.
   * gcc.dg/pr66299-2.c: New test.
   * gcc.dg/pr66299-3.c: New test.
 
 diff --git gcc/match.pd gcc/match.pd
 index abd7851..560b218 100644
 --- gcc/match.pd
 +++ gcc/match.pd
 @@ -676,6 +676,21 @@ along with GCC; see the file COPYING3.  If not see
(cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
(icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
  
 +/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
 +   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
 +   if CST2 != 0.  */
 +(for cmp (ne eq)
 + (simplify
 +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
 +  (with { int cand = wi::ctz (@2) - wi::ctz (@0); }
 +   (if (cand  0
 + || (!integer_zerop (@2)
 +  wi::ne_p (wi::lshift (@0, cand), @2)))
 +{ constant_boolean_node (cmp == NE_EXPR, type); })
 +   (if (!integer_zerop (@2)
 +  wi::eq_p (wi::lshift (@0, cand), @2))
 +(cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })
 +
  /* Simplifications of conversions.  */
  
  /* Basic strip-useless-type-conversions / strip_nops.  */
 diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
 index e69de29..e75146b 100644
 --- gcc/testsuite/gcc.dg/pr66299-1.c
 +++ gcc/testsuite/gcc.dg/pr66299-1.c
 @@ -0,0 +1,92 @@
 +/* PR tree-optimization/66299 */
 +/* { dg-do run } */
 +/* { dg-options -fdump-tree-original } */
 +
 +void
 +test1 (int x)
 +{
 +  if ((0  x) != 0
 +  || (1  x) != 2
 +  || (2  x) != 4
 +  || (3  x) != 6
 +  || (4  x) != 8
 +  || (5  x) != 10
 +  || (6  x) != 12
 +  || (7  x) != 14
 +  || (8  x) != 16
 +  || (9  x) != 18
 +  || (10  x) != 20)
 +__builtin_abort ();
 +}
 +
 +void
 +test2 (int x)
 +{
 +  if (!((0  x) == 0
 +  (1  x) == 4
 +  (2  x) == 8
 +  (3  x) == 12
 +  (4  x) == 16
 +  (5  x) == 20
 +  (6  x) == 24
 +  (7  x) == 28
 +  (8  x) == 32
 +  (9  x) == 36
 +  (10  x) == 40))
 +__builtin_abort ();
 +}
 +
 +void
 +test3 (unsigned int x)
 +{
 +  if ((0U  x) != 0U
 +  || (1U  x) != 16U
 +  || (2U  x) != 32U
 +  || (3U  x) != 48U
 +  || (4U  x) != 64U
 +  || (5U  x) != 80U
 +  || (6U  x) != 96U
 +  || (7U  x) != 112U
 +  || (8U  x) != 128U
 +  || (9U  x) != 144U
 +  || (10U  x) != 160U)
 +__builtin_abort ();
 +}
 +
 +void
 +test4 (unsigned int x)
 +{
 +  if (!((0U  x) == 0U
 + || (1U  x) == 8U
 + || (2U  x) == 16U
 + || (3U  x) == 24U
 + || (4U  x) == 32U
 + || (5U  x) == 40U
 + || (6U  x) == 48U
 + || (7U  x) == 56U
 + || (8U  x) == 64U
 + || (9U  x) == 72U
 + || (10U  x) == 80U))
 +__builtin_abort ();
 +}
 +
 +void
 +test5 (int x)
 +{
 +  if ((0  x) == 1
 +  || (0  x) != 0
 +  || (0x8001U  x) != 0x2U)
 +__builtin_abort ();
 +}
 +
 +int
 +main (void)
 +{
 +  test1 (1);
 +  test2 (2);
 +  test3 (4U);
 +  test4 (3U);
 +  test5 (17);
 +}
 +
 +/* { dg-final { scan-tree-dump-not  original } } */
 diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
 index e69de29..45e9218 100644
 --- gcc/testsuite/gcc.dg/pr66299-2.c
 +++ gcc/testsuite/gcc.dg/pr66299-2.c
 @@ -0,0 +1,33 @@
 +/* PR tree-optimization/66299 */
 +/* { dg-do run } */
 +/* { dg-options -fdump-tree-optimized -O } */
 +
 +void
 +test1 (int x, unsigned u)
 +{
 +  if ((1U  x) != 64
 +  || (2  x) != u
 +  || (x  x) != 384

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Marc Glisse


On Tue, 9 Jun 2015, Richard Biener wrote:


On Tue, 9 Jun 2015, Richard Biener wrote:


On Tue, 9 Jun 2015, Marc Glisse wrote:


On Tue, 9 Jun 2015, Richard Biener wrote:


Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS and
false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.


Hm, yes.  I think signed overflow != shift amount overflow, so testing the
overflow macros for this isn't valid.


Would it be ok to always turn it to X=31 then? (the value 31 is conveniently
already computed in 'cand')


I think so.


Or even ((unsigned)X - 31)  1 (I probably got that wrong) to properly
say X=29  X32, that is, preserve the implicit upper bound on X
we have because it is used in a shift.


I don't understand in what sense this preserves the upper bound. I would 
understand storing a range for X (when it is an SSA_NAME, and it would 
require a lot of care not to propagate backwards too far), or more simply 
introducing if(X=32) __builtin_unreachable();. But you seem to be talking 
about generating more complicated code so that if someone checks 
(6123)==0 it returns false?


--
Marc Glisse

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Richard Biener

On Tue, 9 Jun 2015, Marc Glisse wrote:

 On Tue, 9 Jun 2015, Richard Biener wrote:
 
  On Tue, 9 Jun 2015, Richard Biener wrote:
  
   On Tue, 9 Jun 2015, Marc Glisse wrote:
   
On Tue, 9 Jun 2015, Richard Biener wrote:

  Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS
  and
  false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.
 
 Hm, yes.  I think signed overflow != shift amount overflow, so testing
 the
 overflow macros for this isn't valid.

Would it be ok to always turn it to X=31 then? (the value 31 is
conveniently
already computed in 'cand')
   
   I think so.
  
  Or even ((unsigned)X - 31)  1 (I probably got that wrong) to properly
  say X=29  X32, that is, preserve the implicit upper bound on X
  we have because it is used in a shift.
 
 I don't understand in what sense this preserves the upper bound. I would
 understand storing a range for X (when it is an SSA_NAME, and it would require
 a lot of care not to propagate backwards too far), or more simply introducing
 if(X=32) __builtin_unreachable();. But you seem to be talking about
 generating more complicated code so that if someone checks (6123)==0 it
 returns false?

Well, I'm mixing simplifying the computation and preserving extra
info we got from the complex computation.  So yes, you are right.

Richard.

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-09 Thread Marek Polacek

On Tue, Jun 09, 2015 at 09:53:21AM +0200, Richard Biener wrote:
  +/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
  +   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
  +   if CST2 != 0.  */
  +(for cmp (ne eq)
  + (simplify
  +  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
  +  (with {
  +   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
  +  (if (!integer_zerop (@2)
 
 
  You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
  indented one space more?
 
 Yes, one space more.  I suppose using integer_zerop might in theory
 allow for handling vector shifts at some point ...?
 
Fixed the formatting.  But I kept the integer_zerop.

  +wi::eq_p (wi::lshift (@0, cand), @2))
  +   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })
 
 
  Making 'cand' signed, you could return 0 when cand0, like (2x)==1. You
  could also return 0 when the candidate turns out not to work: (3x)==4.
 
 Sounds like a good improvement.
 
Yeah, it makes sense to do that while I'm on this.  Done, and a new
test added.

 Otherwise the patch looks ok to me as well - mind doing the improvement above?

Thank you both.  How does this look now?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-09  Marek Polacek  pola...@redhat.com

PR tree-optimization/66299
* match.pd ((CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
((CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)): New
patterns.

* gcc.dg/pr66299-1.c: New test.
* gcc.dg/pr66299-2.c: New test.
* gcc.dg/pr66299-3.c: New test.

diff --git gcc/match.pd gcc/match.pd
index abd7851..560b218 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -676,6 +676,21 @@ along with GCC; see the file COPYING3.  If not see
   (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
 
+/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
+   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
+   if CST2 != 0.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
+  (with { int cand = wi::ctz (@2) - wi::ctz (@0); }
+   (if (cand  0
+   || (!integer_zerop (@2)
+wi::ne_p (wi::lshift (@0, cand), @2)))
+{ constant_boolean_node (cmp == NE_EXPR, type); })
+   (if (!integer_zerop (@2)
+wi::eq_p (wi::lshift (@0, cand), @2))
+(cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
index e69de29..e75146b 100644
--- gcc/testsuite/gcc.dg/pr66299-1.c
+++ gcc/testsuite/gcc.dg/pr66299-1.c
@@ -0,0 +1,92 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options -fdump-tree-original } */
+
+void
+test1 (int x)
+{
+  if ((0  x) != 0
+  || (1  x) != 2
+  || (2  x) != 4
+  || (3  x) != 6
+  || (4  x) != 8
+  || (5  x) != 10
+  || (6  x) != 12
+  || (7  x) != 14
+  || (8  x) != 16
+  || (9  x) != 18
+  || (10  x) != 20)
+__builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  if (!((0  x) == 0
+(1  x) == 4
+(2  x) == 8
+(3  x) == 12
+(4  x) == 16
+(5  x) == 20
+(6  x) == 24
+(7  x) == 28
+(8  x) == 32
+(9  x) == 36
+(10  x) == 40))
+__builtin_abort ();
+}
+
+void
+test3 (unsigned int x)
+{
+  if ((0U  x) != 0U
+  || (1U  x) != 16U
+  || (2U  x) != 32U
+  || (3U  x) != 48U
+  || (4U  x) != 64U
+  || (5U  x) != 80U
+  || (6U  x) != 96U
+  || (7U  x) != 112U
+  || (8U  x) != 128U
+  || (9U  x) != 144U
+  || (10U  x) != 160U)
+__builtin_abort ();
+}
+
+void
+test4 (unsigned int x)
+{
+  if (!((0U  x) == 0U
+   || (1U  x) == 8U
+   || (2U  x) == 16U
+   || (3U  x) == 24U
+   || (4U  x) == 32U
+   || (5U  x) == 40U
+   || (6U  x) == 48U
+   || (7U  x) == 56U
+   || (8U  x) == 64U
+   || (9U  x) == 72U
+   || (10U  x) == 80U))
+__builtin_abort ();
+}
+
+void
+test5 (int x)
+{
+  if ((0  x) == 1
+  || (0  x) != 0
+  || (0x8001U  x) != 0x2U)
+__builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (1);
+  test2 (2);
+  test3 (4U);
+  test4 (3U);
+  test5 (17);
+}
+
+/* { dg-final { scan-tree-dump-not  original } } */
diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
index e69de29..45e9218 100644
--- gcc/testsuite/gcc.dg/pr66299-2.c
+++ gcc/testsuite/gcc.dg/pr66299-2.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options -fdump-tree-optimized -O } */
+
+void
+test1 (int x, unsigned u)
+{
+  if ((1U  x) != 64
+  || (2  x) != u
+  || (x  x) != 384
+  || (3  x) == 9
+  || (x  14) != 98304U
+  || (1  x) == 14
+  || (3  2) != 12)
+__builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  unsigned int t = ((unsigned int) 1U  x);

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-08 Thread Marc Glisse


On Mon, 8 Jun 2015, Marek Polacek wrote:


  PR tree-optimization/66299
  * match.pd ((CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
  ((CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)): New


You are braver than I am, I would have abbreviated ctz (CST2) - ctz (CST1)
to CST3 in the ChangeLog ;-)


+/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
+   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
+   if CST2 != 0.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
+  (with {
+   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
+  (if (!integer_zerop (@2)


You can probably use directly wi::ne_p (@2, 0) here. Shouldn't this be
indented one space more?


+wi::eq_p (wi::lshift (@0, cand), @2))
+   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })


Making 'cand' signed, you could return 0 when cand0, like (2x)==1. You
could also return 0 when the candidate turns out not to work: (3x)==4.

Tweaking it so that (6X)==0 becomes X=31 for TYPE_OVERFLOW_WRAPS and
false for TYPE_OVERFLOW_UNDEFINED is probably more controversial.

FWIW, the patch looks good to me, thanks.

--
Marc Glisse

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-06-08 Thread Marek Polacek

On Thu, May 28, 2015 at 09:48:10PM +0200, Marc Glisse wrote:
 Side note: if we are looking for extra patterns to simplify, llvm has an
 almost unlimited supply. Here are a few we don't seem to have (there are
 more where those came from), of course several need constraining /
 generalizing, it is just a list of hints I wrote for myself.
 
 (A|B)  ~(AB) - A^B
 (A | B)  ((~A) ^ B) - (A  B)
 (A  (~B)) | (A ^ B) - (A ^ B)
 ((B | C)  A) | B - B | (A  C)
 A | ( A ^ B) - A |  B
 A | (~A ^ B) - A | ~B
 (A ^ B)  ((B ^ C) ^ A) - (A ^ B)  ~C
 (A ^ B) | ((B ^ C) ^ A) - (A ^ B) | C
 (A  B) | (A ^ B) - (A | B)
 A | ~(A ^ B) - A | ~B
 (A  B) | ((~A) ^ B) - (~A ^ B)
 ~(~X  Y) - (X | ~Y)
 ~(~X s Y) - (X s Y)
 (A  B)^(A | B) - A ^ B
 (A | ~B) ^ (~A | B) - A ^ B
 (A  ~B) ^ (~A  B) - A ^ B
 (A ^ C)^(A | B) - ((~A)  B) ^ C
 (A  B) ^ (A ^ B) - (A | B)
 (A  ~B) ^ (~A) - ~(A  B)
 (AB)+(A^B) - A|B
 (AB)+(A|B) - A+B
 (A|B)-(A^B) - AB
 ((X | Y) - X) - (~X  Y)
 fmax(x,NaN) - x
 fmax(a,fmax(a,b)) - fmax(a,b)
 (X+2) u X - x u 256-2
 (1  X)   30 - X = 4
 ((X  ~7) == 0) - X  8
 2 * X  5 - X = 2
 ((1  x)8) == 0 - x != 3
 ((1  x)7) == 0 - x  2
 Y - Z  X - Z - Y  X
 3 * X == 3 * Y - X == Y
 A  3 == B  3 - (A ^ B)  8
 (float)int = 4.4 - int = 4
 x unle x - x ord x

Thanks for this list.  I'll look at implementing (some of) them.
 
 On Thu, 28 May 2015, Jakub Jelinek wrote:
 
 Is CST2 a multiple of CST1 the best test though?

Apparently not ;).

 I mean say in
 (0x8001U  x) == 0x2U
 0x2U isn't a multiple of 0x8001U, yet there is only one
 valid value of x for which it holds (17), so we could very well
 optimize that to x == 17.

Yeah.

 If popcount of the CST1 is 1, then multiple_of_p is supposedly sufficient
 (have you checked if CST1 is negative that it still works?), for others
 supposedly we could have a helper function that would just try
 in a loop all shift counts from 0 to precision - 1, and note when
 (CST1  b) == CST2 - if for no b, then it should fold regardless of
 has_single_use to false or true, if for exactly one shift count, then
 use a comparison against that shift count, otherwise give up?
 
 ctz(CST2)-ctz(CST1) should provide a single candidate without looping.
 ctz(CST1) is also relevant when CST2==0.

That seems to work well so the following patch is an attempt to do it
so.
If CST2 is non-zero, we compute a candidate and verify whether this
candidate works.  If so, we know there's exactly one so we should be
able to fold the shift into comparison.
I've tried even negative numbers and it seems to DTRT, but I'd certainly
appreciate if y'all could take a look at this.  Thanks.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-08  Marek Polacek  pola...@redhat.com

PR tree-optimization/66299
* match.pd ((CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
((CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)): New
patterns.

* gcc.dg/pr66299-1.c: New test.
* gcc.dg/pr66299-2.c: New test.

diff --git gcc/match.pd gcc/match.pd
index abd7851..32e913c 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -676,6 +676,18 @@ along with GCC; see the file COPYING3.  If not see
   (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
 
+/* (CST1  A) == CST2 - A == ctz (CST2) - ctz (CST1)
+   (CST1  A) != CST2 - A != ctz (CST2) - ctz (CST1)
+   if CST2 != 0.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift INTEGER_CST@0 @1) INTEGER_CST@2)
+  (with {
+   unsigned int cand = wi::ctz (@2) - wi::ctz (@0); }
+  (if (!integer_zerop (@2)
+wi::eq_p (wi::lshift (@0, cand), @2))
+   (cmp @1 { build_int_cst (TREE_TYPE (@1), cand); })
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
index e69de29..e7b978d 100644
--- gcc/testsuite/gcc.dg/pr66299-1.c
+++ gcc/testsuite/gcc.dg/pr66299-1.c
@@ -0,0 +1,92 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options -fdump-tree-original } */
+
+void
+test1 (int x)
+{
+  if ((0  x) != 0
+  || (1  x) != 2
+  || (2  x) != 4
+  || (3  x) != 6
+  || (4  x) != 8
+  || (5  x) != 10
+  || (6  x) != 12
+  || (7  x) != 14
+  || (8  x) != 16
+  || (9  x) != 18
+  || (10  x) != 20)
+__builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  if (!((0  x) == 0
+ (1  x) == 4
+ (2  x) == 8
+ (3  x) == 12
+ (4  x) == 16
+ (5  x) == 20
+ (6  x) == 24
+ (7  x) == 28
+ (8  x) == 32
+ (9  x) == 36
+(10  x) == 40))
+__builtin_abort ();
+}
+
+void
+test3 (unsigned int x)
+{
+  if ((0U  x) != 0U
+  || (1U  x) != 16U
+  || (2U  x) != 32U
+  || (3U  x) != 48U
+  || (4U  x) != 64U
+  || (5U  x) != 80U
+  || (6U  x) != 96U
+  || (7U  x) != 112U
+  || (8U  x) != 128U
+  || (9U  x) != 144U
+  || (10U  x)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-05-28 Thread Marc Glisse


On Thu, 28 May 2015, Marek Polacek wrote:


This PR points out that we weren't able to optimize 1  x == 2 to just
x == 1.


Side note: if we are looking for extra patterns to simplify, llvm has an 
almost unlimited supply. Here are a few we don't seem to have (there are 
more where those came from), of course several need constraining / 
generalizing, it is just a list of hints I wrote for myself.


(A|B)  ~(AB) - A^B
(A | B)  ((~A) ^ B) - (A  B)
(A  (~B)) | (A ^ B) - (A ^ B)
((B | C)  A) | B - B | (A  C)
A | ( A ^ B) - A |  B
A | (~A ^ B) - A | ~B
(A ^ B)  ((B ^ C) ^ A) - (A ^ B)  ~C
(A ^ B) | ((B ^ C) ^ A) - (A ^ B) | C
(A  B) | (A ^ B) - (A | B)
A | ~(A ^ B) - A | ~B
(A  B) | ((~A) ^ B) - (~A ^ B)
~(~X  Y) - (X | ~Y)
~(~X s Y) - (X s Y)
(A  B)^(A | B) - A ^ B
(A | ~B) ^ (~A | B) - A ^ B
(A  ~B) ^ (~A  B) - A ^ B
(A ^ C)^(A | B) - ((~A)  B) ^ C
(A  B) ^ (A ^ B) - (A | B)
(A  ~B) ^ (~A) - ~(A  B)
(AB)+(A^B) - A|B
(AB)+(A|B) - A+B
(A|B)-(A^B) - AB
((X | Y) - X) - (~X  Y)
fmax(x,NaN) - x
fmax(a,fmax(a,b)) - fmax(a,b)
(X+2) u X - x u 256-2
(1  X)   30 - X = 4
((X  ~7) == 0) - X  8
2 * X  5 - X = 2
((1  x)8) == 0 - x != 3
((1  x)7) == 0 - x  2
Y - Z  X - Z - Y  X
3 * X == 3 * Y - X == Y
A  3 == B  3 - (A ^ B)  8
(float)int = 4.4 - int = 4
x unle x - x ord x



+/* (CST1  A) == CST2 - A == log2 (CST2 / CST1)
+   (CST1  A) != CST2 - A != log2 (CST2 / CST1)
+   if CST2 is a multiple of CST1.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift@3 INTEGER_CST@0 @1) INTEGER_CST@2)
+  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
+wi::multiple_of_p (@2, @0, TYPE_SIGN (type)))


Doesn't type refer to the result of the EQ_EXPR here?


On Thu, 28 May 2015, Jakub Jelinek wrote:


Is CST2 a multiple of CST1 the best test though?
I mean say in
(0x8001U  x) == 0x2U
0x2U isn't a multiple of 0x8001U, yet there is only one
valid value of x for which it holds (17), so we could very well
optimize that to x == 17.
If popcount of the CST1 is 1, then multiple_of_p is supposedly sufficient
(have you checked if CST1 is negative that it still works?), for others
supposedly we could have a helper function that would just try
in a loop all shift counts from 0 to precision - 1, and note when
(CST1  b) == CST2 - if for no b, then it should fold regardless of
has_single_use to false or true, if for exactly one shift count, then
use a comparison against that shift count, otherwise give up?


ctz(CST2)-ctz(CST1) should provide a single candidate without looping. 
ctz(CST1) is also relevant when CST2==0.


--
Marc Glisse

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-05-28 Thread Jakub Jelinek

On Thu, May 28, 2015 at 02:15:45PM +0200, Marek Polacek wrote:
 This PR points out that we weren't able to optimize 1  x == 2 to just
 x == 1.  This is my attempt to fix that: if we see (CST1  A) == CST2
 and CST2 is a multiple of CST1, use log2 to get rid of the shift, but
 only if the result of the shift is a natural number (including zero).

Is CST2 a multiple of CST1 the best test though?
I mean say in
(0x8001U  x) == 0x2U
0x2U isn't a multiple of 0x8001U, yet there is only one
valid value of x for which it holds (17), so we could very well
optimize that to x == 17.
If popcount of the CST1 is 1, then multiple_of_p is supposedly sufficient
(have you checked if CST1 is negative that it still works?), for others
supposedly we could have a helper function that would just try
in a loop all shift counts from 0 to precision - 1, and note when
(CST1  b) == CST2 - if for no b, then it should fold regardless of
has_single_use to false or true, if for exactly one shift count, then
use a comparison against that shift count, otherwise give up?
Supposedly (CST1  A) == CST2 can be handled similarly.

 If CST2 is not a multiple of CST1, then the whole expression can be
 discarded, but I'd like to do that as a follow-up.
 (It would help if our current match.pd grammar allowed us to use else,
 any plans on doing that?)

Jakub

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-05-28 Thread Richard Biener

On Thu, May 28, 2015 at 2:15 PM, Marek Polacek pola...@redhat.com wrote:
 This PR points out that we weren't able to optimize 1  x == 2 to just
 x == 1.  This is my attempt to fix that: if we see (CST1  A) == CST2
 and CST2 is a multiple of CST1, use log2 to get rid of the shift, but
 only if the result of the shift is a natural number (including zero).

 If CST2 is not a multiple of CST1, then the whole expression can be
 discarded, but I'd like to do that as a follow-up.
 (It would help if our current match.pd grammar allowed us to use else,
 any plans on doing that?)

 Bootstrapped/regtested on x86_64-linux, ok for trunk?

 2015-05-28  Marek Polacek  pola...@redhat.com

 PR tree-optimization/66299
 * match.pd ((CST1  A) == CST2 - A == log2 (CST2 / CST1),
 (CST1  A) != CST2 - A != log2 (CST2 / CST1)): New
 patterns.

 * gcc.dg/pr66299-1.c: New test.
 * gcc.dg/pr66299-2.c: New test.

 diff --git gcc/match.pd gcc/match.pd
 index abd7851..5d07a70 100644
 --- gcc/match.pd
 +++ gcc/match.pd
 @@ -676,6 +676,19 @@ along with GCC; see the file COPYING3.  If not see
(cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
(icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))

 +/* (CST1  A) == CST2 - A == log2 (CST2 / CST1)
 +   (CST1  A) != CST2 - A != log2 (CST2 / CST1)
 +   if CST2 is a multiple of CST1.  */
 +(for cmp (ne eq)
 + (simplify
 +  (cmp (lshift@3 INTEGER_CST@0 @1) INTEGER_CST@2)
 +  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))

I think we have the single_use (@3) helper now.  Not sure why you
restrict this here though - we are only creating new constants.

Ok with dropping the single-use check.

 +wi::multiple_of_p (@2, @0, TYPE_SIGN (type)))
 +   (with {
 +int shift = wi::exact_log2 (wi::div_trunc (@2, @0, TYPE_SIGN (type))); }
 +   (if (shift != -1)
 +(cmp @1 { build_int_cst (TREE_TYPE (@1), shift); }))

so with else you mean

(if (shift != -1)
  ...
  (else-expr ...))

?  Sure that's possible.  Today you can write

   (if (shift != -1)
  ...)
   (if (shift == -1)
 ...)

or

   (if (shift != -1)
 )
   (else-expr)

which is equivalent if the if is the only one at the nesting level and
the then-expr
doesn't contain any further ones.  That is, it is equivalent to

 if () ...;
 else-expr;

thus the fall-thru

So (if ...) would get an optional else-expr, yes, that sounds useful.
I think we
already have some (if A ..) (if !A ..) in match.pd.

Thanks,
Richard.

 +
  /* Simplifications of conversions.  */

  /* Basic strip-useless-type-conversions / strip_nops.  */
 diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
 index e69de29..9d41275 100644
 --- gcc/testsuite/gcc.dg/pr66299-1.c
 +++ gcc/testsuite/gcc.dg/pr66299-1.c
 @@ -0,0 +1,83 @@
 +/* PR tree-optimization/66299 */
 +/* { dg-do run } */
 +/* { dg-options -fdump-tree-original } */
 +
 +void
 +test1 (int x)
 +{
 +  if ((0  x) != 0
 +  || (1  x) != 2
 +  || (2  x) != 4
 +  || (3  x) != 6
 +  || (4  x) != 8
 +  || (5  x) != 10
 +  || (6  x) != 12
 +  || (7  x) != 14
 +  || (8  x) != 16
 +  || (9  x) != 18
 +  || (10  x) != 20)
 +__builtin_abort ();
 +}
 +
 +void
 +test2 (int x)
 +{
 +  if (!((0  x) == 0
 + (1  x) == 4
 + (2  x) == 8
 + (3  x) == 12
 + (4  x) == 16
 + (5  x) == 20
 + (6  x) == 24
 + (7  x) == 28
 + (8  x) == 32
 + (9  x) == 36
 +(10  x) == 40))
 +__builtin_abort ();
 +}
 +
 +void
 +test3 (unsigned int x)
 +{
 +  if ((0U  x) != 0U
 +  || (1U  x) != 16U
 +  || (2U  x) != 32U
 +  || (3U  x) != 48U
 +  || (4U  x) != 64U
 +  || (5U  x) != 80U
 +  || (6U  x) != 96U
 +  || (7U  x) != 112U
 +  || (8U  x) != 128U
 +  || (9U  x) != 144U
 +  || (10U  x) != 160U)
 +__builtin_abort ();
 +}
 +
 +void
 +test4 (unsigned int x)
 +{
 +  if (!((0U  x) == 0U
 +   || (1U  x) == 8U
 +   || (2U  x) == 16U
 +   || (3U  x) == 24U
 +   || (4U  x) == 32U
 +   || (5U  x) == 40U
 +   || (6U  x) == 48U
 +   || (7U  x) == 56U
 +   || (8U  x) == 64U
 +   || (9U  x) == 72U
 +   || (10U  x) == 80U))
 +__builtin_abort ();
 +}
 +
 +int
 +main (void)
 +{
 +  test1 (1);
 +  test2 (2);
 +  test3 (4U);
 +  test4 (3U);
 +}
 +
 +/* { dg-final { scan-tree-dump-not  original } } */
 +/* { dg-final { cleanup-tree-dump original } } */
 diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
 index e69de29..dde0549 100644
 --- gcc/testsuite/gcc.dg/pr66299-2.c
 +++ gcc/testsuite/gcc.dg/pr66299-2.c
 @@ -0,0 +1,34 @@
 +/* PR tree-optimization/66299 */
 +/* { dg-do run } */
 +/* { dg-options -fdump-tree-optimized -O } */
 +
 +void
 +test1 (int x, unsigned u)
 +{
 +  if ((1U  x) != 64
 +  || (2  x) != u
 +  || (x  x) != 384
 +  || (3  x) == 9
 +

[PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

2015-05-28 Thread Marek Polacek

This PR points out that we weren't able to optimize 1  x == 2 to just
x == 1.  This is my attempt to fix that: if we see (CST1  A) == CST2
and CST2 is a multiple of CST1, use log2 to get rid of the shift, but
only if the result of the shift is a natural number (including zero).

If CST2 is not a multiple of CST1, then the whole expression can be
discarded, but I'd like to do that as a follow-up.
(It would help if our current match.pd grammar allowed us to use else,
any plans on doing that?)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-05-28  Marek Polacek  pola...@redhat.com

PR tree-optimization/66299
* match.pd ((CST1  A) == CST2 - A == log2 (CST2 / CST1),
(CST1  A) != CST2 - A != log2 (CST2 / CST1)): New
patterns.

* gcc.dg/pr66299-1.c: New test.
* gcc.dg/pr66299-2.c: New test.

diff --git gcc/match.pd gcc/match.pd
index abd7851..5d07a70 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -676,6 +676,19 @@ along with GCC; see the file COPYING3.  If not see
   (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop)
   (icmp @0 { build_zero_cst (TREE_TYPE (@0)); })))
 
+/* (CST1  A) == CST2 - A == log2 (CST2 / CST1)
+   (CST1  A) != CST2 - A != log2 (CST2 / CST1)
+   if CST2 is a multiple of CST1.  */
+(for cmp (ne eq)
+ (simplify
+  (cmp (lshift@3 INTEGER_CST@0 @1) INTEGER_CST@2)
+  (if ((TREE_CODE (@3) != SSA_NAME || has_single_use (@3))
+wi::multiple_of_p (@2, @0, TYPE_SIGN (type)))
+   (with {
+int shift = wi::exact_log2 (wi::div_trunc (@2, @0, TYPE_SIGN (type))); }
+   (if (shift != -1)
+(cmp @1 { build_int_cst (TREE_TYPE (@1), shift); }))
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git gcc/testsuite/gcc.dg/pr66299-1.c gcc/testsuite/gcc.dg/pr66299-1.c
index e69de29..9d41275 100644
--- gcc/testsuite/gcc.dg/pr66299-1.c
+++ gcc/testsuite/gcc.dg/pr66299-1.c
@@ -0,0 +1,83 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options -fdump-tree-original } */
+
+void
+test1 (int x)
+{
+  if ((0  x) != 0
+  || (1  x) != 2
+  || (2  x) != 4
+  || (3  x) != 6
+  || (4  x) != 8
+  || (5  x) != 10
+  || (6  x) != 12
+  || (7  x) != 14
+  || (8  x) != 16
+  || (9  x) != 18
+  || (10  x) != 20)
+__builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  if (!((0  x) == 0
+ (1  x) == 4
+ (2  x) == 8
+ (3  x) == 12
+ (4  x) == 16
+ (5  x) == 20
+ (6  x) == 24
+ (7  x) == 28
+ (8  x) == 32
+ (9  x) == 36
+(10  x) == 40))
+__builtin_abort ();
+}
+
+void
+test3 (unsigned int x)
+{
+  if ((0U  x) != 0U
+  || (1U  x) != 16U
+  || (2U  x) != 32U
+  || (3U  x) != 48U
+  || (4U  x) != 64U
+  || (5U  x) != 80U
+  || (6U  x) != 96U
+  || (7U  x) != 112U
+  || (8U  x) != 128U
+  || (9U  x) != 144U
+  || (10U  x) != 160U)
+__builtin_abort ();
+}
+
+void
+test4 (unsigned int x)
+{
+  if (!((0U  x) == 0U
+   || (1U  x) == 8U
+   || (2U  x) == 16U
+   || (3U  x) == 24U
+   || (4U  x) == 32U
+   || (5U  x) == 40U
+   || (6U  x) == 48U
+   || (7U  x) == 56U
+   || (8U  x) == 64U
+   || (9U  x) == 72U
+   || (10U  x) == 80U))
+__builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (1);
+  test2 (2);
+  test3 (4U);
+  test4 (3U);
+}
+
+/* { dg-final { scan-tree-dump-not  original } } */
+/* { dg-final { cleanup-tree-dump original } } */
diff --git gcc/testsuite/gcc.dg/pr66299-2.c gcc/testsuite/gcc.dg/pr66299-2.c
index e69de29..dde0549 100644
--- gcc/testsuite/gcc.dg/pr66299-2.c
+++ gcc/testsuite/gcc.dg/pr66299-2.c
@@ -0,0 +1,34 @@
+/* PR tree-optimization/66299 */
+/* { dg-do run } */
+/* { dg-options -fdump-tree-optimized -O } */
+
+void
+test1 (int x, unsigned u)
+{
+  if ((1U  x) != 64
+  || (2  x) != u
+  || (x  x) != 384
+  || (3  x) == 9
+  || (x  14) != 98304U
+  || (1  x) == 14
+  || (3  2) != 12)
+__builtin_abort ();
+}
+
+void
+test2 (int x)
+{
+  unsigned int t = ((unsigned int) 1U  x);
+  if (t != 2U)
+__builtin_abort ();
+}
+
+int
+main (void)
+{
+  test1 (6, 128U);
+  test2 (1);
+}
+
+/* { dg-final { scan-tree-dump-not  optimized } } */
+/* { dg-final { cleanup-tree-dump optimized } } */

Marek

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

Re: [PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

[PATCH] Optimize (CST1 A) == CST2 (PR tree-optimization/66299)

14 matches

Site Navigation

Mail list logo

Footer information