Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-16 Thread Richard Biener
On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law l...@redhat.com wrote:
 On 10/15/13 10:53, Jakub Jelinek wrote:

 On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:

 I noticed that we're now including rtl.h and tm_p.h in
 tree-ssa-reassoc.c, which seems wrong.


 Isn't that required for BRANCH_COST use?
 Other option would be to add some dummy wrapper around
 BRANCH_COST, put that wrapper into some file that already includes rtl.h
 and
 tm_p.h and just call it from there.

 Yea, looking at it now, I'm a bit surprised this hasn't been converted to a
 target hook which would avoid this problem entirely.

You got the job!

;)

Richard.

 jeff


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-16 Thread Jeff Law

On 10/16/13 02:21, Richard Biener wrote:

On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law l...@redhat.com wrote:

On 10/15/13 10:53, Jakub Jelinek wrote:


On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:


I noticed that we're now including rtl.h and tm_p.h in
tree-ssa-reassoc.c, which seems wrong.



Isn't that required for BRANCH_COST use?
Other option would be to add some dummy wrapper around
BRANCH_COST, put that wrapper into some file that already includes rtl.h
and
tm_p.h and just call it from there.


Yea, looking at it now, I'm a bit surprised this hasn't been converted to a
target hook which would avoid this problem entirely.


You got the job!

Joys :-)

What's the policy on the GDFL stuff.  I've tried so hard to avoid having 
to worry about that rats nest that I have no idea what our policy is. 
Basically I just want to take the old docs for BRANCH_COST and re-use 
those.  Is that considered kosher with all the GDFL nonsense?


Jeff


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-16 Thread Joseph S. Myers
On Wed, 16 Oct 2013, Jeff Law wrote:

 What's the policy on the GDFL stuff.  I've tried so hard to avoid having to
 worry about that rats nest that I have no idea what our policy is. Basically I
 just want to take the old docs for BRANCH_COST and re-use those.  Is that
 considered kosher with all the GDFL nonsense?

When moving documentation text from the manual into target.def (so it ends 
up in both target.def and tm.texi, with tm.texi.in just having an @hook 
line to show where the regeneration of tm.texi should put the text), CC 
one of the people listed as docstring relicensing maintainers and ask us 
to approve the GPL relicensing in that case.

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Zhenqiang Chen


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Monday, October 14, 2013 4:49 PM
 To: Zhenqiang Chen
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
 CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0
 
 On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote:
 
 @@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range,
 struct range_entry *otherrange,
return true;
  }
 
 +/* Optimize X == CST1 || X == CST2
 +   if popcount (CST1 ^ CST2) == 1 into
 +   (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2)).
 +   Similarly for ranges.  E.g.
 +   X != 2  X != 3  X != 10  X != 11
 +   will be transformed by the previous optimization into
 +   (X - 2U) = 1U  (X - 10U) = 1U
 +   and this loop can transform that into
 +   ((X  ~8) - 2U) = 1U.  */
 +
 +static bool
 +try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree
type,
 + tree lowi, tree lowj, tree highi, tree highj,
 + vecoperand_entry_t *ops,
 + struct range_entry *ranges)
 
 The function names are bad, you aren't transfering anything, but
optimizing.
 Please rename try_transfer_range_tests to optimize_range_tests_1 and
 try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps
 better yet to optimize_range_tests_{xor,diff}.

try_transfer_range_tests is changed to optimize_range_tests_1
try_transfer_range_tests_1 is changed to optimize_range_tests_xor
try_transfer_range_tests_2 is changed to optimize_range_tests_diff

 Also, perhaps instead of passing ranges and i and j to these two functions
 you could pass struct range_entry *range1, struct range_entry *range2
 (caller would pass ranges + i, ranges + j).

Updated to rangei and rangej since we use i, j, lowi, lowj etc at other
places.
 
 +/* It does some common checks for function try_transfer_range_tests_1
 and
 +   try_transfer_range_tests_2.
 
 Please adjust the comment for the renaming.  Please change trans_option to
 bool optimize_xor.

Updated.

 +   if (trans_option == 1)
 + {
 +   if (try_transfer_range_tests_1 (opcode, i, j, type, lowi,
lowj,
 +   highi, highj, ops, ranges))
 + {
 +   any_changes = true;
 +   break;
 + }
 + }
 +   else if (trans_option == 2)
 + {
 +   if (try_transfer_range_tests_2 (opcode, i, j, type, lowi,
lowj,
 +   highi, highj, ops, ranges))
 + {
 +   any_changes = true;
 +   break;
 + }
 + }
 
 I'd prefer
   if (optimize_xor)
 any_changes
   = optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
   highj, ops, ranges + i, ranges + j);
   else
 any_changes
   = optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
   highj, ops, ranges + i, ranges + j);
   if (any_changes)
 break;

This is incorrect. The any_changes should be |= of the return values.

In the updated patch, I changed the code segment as

  bool changes;
  ...
  if (optimize_xor)
changes = optimize_range_tests_xor (opcode, type, lowi, lowj,
highi, highj, ops,
ranges + i, ranges + j);
  else
changes = optimize_range_tests_diff (opcode, type, lowi, lowj,
 highi, highj, ops,
 ranges + i, ranges + j);
  if (changes)
{
  any_changes = true;
  break;
}

Is it OK?

Thanks!
-Zhenqiang

 Ok with those changes.
 
   Jakub


reassoc-final.patch
Description: Binary data


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Jakub Jelinek
On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:
 Is it OK?

Ok, except two comments apparently still need updating.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2  X != 3  X != 10  X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) = 1U  (X - 10U) = 1U
+   and this loop can transform that into
+   ((X  ~8) - 2U) = 1U.  */

Here the example is using != and , so it is transformed into:
   !((X - 2U) = 1U || (X - 10U) = 1U)
(everything is negated at the end, and note || instead of ,
with  it could fold into !(0))
and finally into:
   !(((X  ~8) - 2U) = 1U)
Or alternatively change the first expression into:
  X == 2 || X == 3 || X == 10 || X == 11
and the second to:
  (X - 2U) = 1U || (X - 10U) = 1U
and the third keep as is.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST2 - CST1) == 1 into
+   ((X - CST1)  ~(CST2 - CST1)) == 0.
+   Similarly for ranges.  E.g.
+   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+   || X == 75 || X == 45
+   will be transformed by the previous optimization into
+   (X - 43U) = 3U  (X - 75U) = 3U
+   and this loop can transform that into
+   ((X - 43U)  ~(75U - 43U)) = 3U.  */

Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
   (X - 43U) = 3U || (X - 75U) = 3U

Ok with those changes.

Jakub


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law

On 10/11/13 20:11, Zhenqiang Chen wrote:




-Original Message-
From: Jeff Law [mailto:l...@redhat.com]
Sent: Friday, October 11, 2013 1:20 PM
To: Zhenqiang Chen
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2

-

CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0

On 10/10/13 03:25, Zhenqiang Chen wrote:



It comes from Coremark. The code is:

if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')

I should have guessed ;-)




For ARM, there are three instructions rather than 4 (in thumb state).
For some older gcc, I got performance improvement on ARM chromebook.
But I can not reproduce the performance gain now.

I will collect logs during GCC bootstrap.

Thanks.  It doesn't have to be anything particularly complex.  When I'm
looking at a transformation I usually just put a printf when it triggers

and grep

for the string in a make log.


Thanks. I check the make log. There are 1906 occurrences.

Wow.  That's awsome.  Thanks,

jeff



Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law

On 10/15/13 02:12, Jakub Jelinek wrote:

On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:

Is it OK?


Ok, except two comments apparently still need updating.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2  X != 3  X != 10  X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) = 1U  (X - 10U) = 1U
+   and this loop can transform that into
+   ((X  ~8) - 2U) = 1U.  */

Here the example is using != and , so it is transformed into:
!((X - 2U) = 1U || (X - 10U) = 1U)
(everything is negated at the end, and note || instead of ,
with  it could fold into !(0))
and finally into:
!(((X  ~8) - 2U) = 1U)
Or alternatively change the first expression into:
   X == 2 || X == 3 || X == 10 || X == 11
and the second to:
   (X - 2U) = 1U || (X - 10U) = 1U
and the third keep as is.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST2 - CST1) == 1 into
+   ((X - CST1)  ~(CST2 - CST1)) == 0.
+   Similarly for ranges.  E.g.
+   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+   || X == 75 || X == 45
+   will be transformed by the previous optimization into
+   (X - 43U) = 3U  (X - 75U) = 3U
+   and this loop can transform that into
+   ((X - 43U)  ~(75U - 43U)) = 3U.  */

Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
(X - 43U) = 3U || (X - 75U) = 3U

Ok with those changes.
I noticed that we're now including rtl.h and tm_p.h in 
tree-ssa-reassoc.c, which seems wrong.


Jeff


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Jakub Jelinek
On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:
 I noticed that we're now including rtl.h and tm_p.h in
 tree-ssa-reassoc.c, which seems wrong.

Isn't that required for BRANCH_COST use?
Other option would be to add some dummy wrapper around
BRANCH_COST, put that wrapper into some file that already includes rtl.h and
tm_p.h and just call it from there.

Jakub


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law

On 10/15/13 10:53, Jakub Jelinek wrote:

On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote:

I noticed that we're now including rtl.h and tm_p.h in
tree-ssa-reassoc.c, which seems wrong.


Isn't that required for BRANCH_COST use?
Other option would be to add some dummy wrapper around
BRANCH_COST, put that wrapper into some file that already includes rtl.h and
tm_p.h and just call it from there.
Yea, looking at it now, I'm a bit surprised this hasn't been converted 
to a target hook which would avoid this problem entirely.


jeff


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-15 Thread Jeff Law

On 10/15/13 02:12, Jakub Jelinek wrote:

On Tue, Oct 15, 2013 at 03:57:23PM +0800, Zhenqiang Chen wrote:

Is it OK?


Ok, except two comments apparently still need updating.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2  X != 3  X != 10  X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) = 1U  (X - 10U) = 1U
+   and this loop can transform that into
+   ((X  ~8) - 2U) = 1U.  */

Here the example is using != and , so it is transformed into:
!((X - 2U) = 1U || (X - 10U) = 1U)
(everything is negated at the end, and note || instead of ,
with  it could fold into !(0))
and finally into:
!(((X  ~8) - 2U) = 1U)
Or alternatively change the first expression into:
   X == 2 || X == 3 || X == 10 || X == 11
and the second to:
   (X - 2U) = 1U || (X - 10U) = 1U
and the third keep as is.

+/* Optimize X == CST1 || X == CST2
+   if popcount (CST2 - CST1) == 1 into
+   ((X - CST1)  ~(CST2 - CST1)) == 0.
+   Similarly for ranges.  E.g.
+   X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46
+   || X == 75 || X == 45
+   will be transformed by the previous optimization into
+   (X - 43U) = 3U  (X - 75U) = 3U
+   and this loop can transform that into
+   ((X - 43U)  ~(75U - 43U)) = 3U.  */

Here the example is using == and ||, so the only wrong thing in there
is that the second expression should be
(X - 43U) = 3U || (X - 75U) = 3U

Ok with those changes.
I fixed up the comments  ChangeLog entry and committed on Zhenqiang's 
behalf.


jeff



RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-14 Thread Zhenqiang Chen
Patch and test cases are updated according to your comments.

Bootstrap and no make check regression on X86-64.

ChangeLog:
2013-10-14  Zhenqiang Chen  zhenqiang.c...@arm.com

* tree-ssa-reassoc.c (try_transfer_range_tests_1): New function,
extracted from optimize_range_tests
* (try_transfer_range_tests_2): New function.
* (try_transfer_range_tests): New function, extracted from
optimize_range_tests and calls try_transfer_range_tests_1 and
try_transfer_range_tests_2,
* (optimize_range_tests): Use try_transfer_range_tests.

testsuite/ChangeLog:
2013-10-14  Zhenqiang Chen  zhenqiang.c...@arm.com

* gcc.dg/tree-ssa/reassoc-32.c: New test case.
* gcc.dg/tree-ssa/reassoc-33.c: New test case.
* gcc.dg/tree-ssa/reassoc-34.c: New test case.
* gcc.dg/tree-ssa/reassoc-35.c: New test case.
* gcc.dg/tree-ssa/reassoc-36.c: New test case.

 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Saturday, October 12, 2013 3:40 PM
 To: Zhenqiang Chen
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
 CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0
 
 On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote:
  As you had mentioned, the transition in this patch does not reduce
  instructions. But the preexisting optimization does. So we prefer to
  do the preexisting optimization first. E.g.
 
  X == 10 || X == 12 || X == 26
 
 Ok, that makes sense.
 
 @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode,
   }
  }
 
 +  /* Optimize X == CST1 || X == CST2
 + if popcount (CST2 - CST1) == 1 into
 + ((X - CST1)  ~(CST2 - CST1)) == 0.  */
 
 Mention here also similarly to the other comment that it works also with
 ranges.
 
 +  if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2)
 +for (i = first; i  length; i++)
 +  {
 + tree lowi, highi, lowj, highj, type, tem1, tem2, mask;
 +
 + if (ranges[i].exp == NULL_TREE || ranges[i].in_p)
 +   continue;
 + type = TREE_TYPE (ranges[i].exp);
 + if (!INTEGRAL_TYPE_P (type))
 +   continue;
 + lowi = ranges[i].low;
 + if (lowi == NULL_TREE)
 +   continue;
 
 The other loop has here:
   if (lowi == NULL_TREE)
 lowi = TYPE_MIN_VALUE (type);
 instead, which is better, can you please change it?
 
 + highi = ranges[i].high;
 + if (highi == NULL_TREE)
 +   continue;
 + for (j = i + 1; j  length  j  i + 64; j++)
 +   {
 + lowj = ranges[j].low;
 + if (lowj == NULL_TREE)
 +   continue;
 + highj = ranges[j].high;
 + if (highj == NULL_TREE)
 +   continue;
 
 The other loop has
   if (highj == NULL_TREE)
 highj = TYPE_MAX_VALUE (type);
 here instead.
 
 + if (ranges[j].exp == NULL_TREE || ranges[j].in_p
 + || (ranges[i].exp != ranges[j].exp))
 +   continue;
 
 Can you please move this test the lowj = assignment, and remove the
 ranges[j].exp == NULL_TREE test (both here and in the earlier loop)?  I
mean,
 we've checked at the beginning of for (i = first; i  length; i++) loop
that
 ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will
be
 different than ranges[i].exp.
 So
   if (ranges[i].exp != ranges[j].exp || ranges[j].in_p)
 continue;
 (and please also collapse the two checks in the first loop, so that both
look
 similar).
 
 + /* Check lowj  highi.  */
 + tem1 = fold_binary (GT_EXPR, boolean_type_node,
 +lowj, highi);
 + if (tem1 == NULL_TREE || !integer_onep (tem1))
 +   continue;
 + /* Check highi - lowi == highj - lowj.  */
 + tem1 = fold_binary (MINUS_EXPR, type, highi, lowi);
 + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
 +   continue;
 + tem2 = fold_binary (MINUS_EXPR, type, highj, lowj);
 + if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST)
 +   continue;
 + if (!tree_int_cst_equal (tem1, tem2))
 +   continue;
 + /* Check popcount (lowj - lowi) == 1.  */
 + tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi);
 + if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
 +   continue;
 + if (tree_log2 (tem1)  0)
 +   continue;
 + mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
 + tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi);
 + tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
 + lowi = build_int_cst (type, 0);
 
 Please use lowj instead of lowi here.  Because, if update_range_test
fails, we
 continue looking for further matches in following ranges, and while lowj
will
 be computed again, lowi will be assumed to contain the low bound of the
 first range.
 
 + if (update_range_test (ranges + i, ranges + j, 1, opcode, ops

Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-14 Thread Jakub Jelinek
On Mon, Oct 14, 2013 at 03:10:12PM +0800, Zhenqiang Chen wrote:

@@ -2131,6 +2133,155 @@ update_range_test (struct range_entry *range, struct 
range_entry *otherrange,
   return true;
 }
 
+/* Optimize X == CST1 || X == CST2
+   if popcount (CST1 ^ CST2) == 1 into
+   (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2)).
+   Similarly for ranges.  E.g.
+   X != 2  X != 3  X != 10  X != 11
+   will be transformed by the previous optimization into
+   (X - 2U) = 1U  (X - 10U) = 1U
+   and this loop can transform that into
+   ((X  ~8) - 2U) = 1U.  */
+
+static bool
+try_transfer_range_tests_1 (enum tree_code opcode, int i, int j, tree type,
+   tree lowi, tree lowj, tree highi, tree highj,
+   vecoperand_entry_t *ops,
+   struct range_entry *ranges)

The function names are bad, you aren't transfering anything, but optimizing.
Please rename try_transfer_range_tests to optimize_range_tests_1 and
try_transfer_range_tests_{1,2} to optimize_range_tests_{2,3} or perhaps
better yet to optimize_range_tests_{xor,diff}.

Also, perhaps instead of passing ranges and i and j to these two functions
you could pass struct range_entry *range1, struct range_entry *range2
(caller would pass ranges + i, ranges + j).

+/* It does some common checks for function try_transfer_range_tests_1 and
+   try_transfer_range_tests_2.

Please adjust the comment for the renaming.  Please change trans_option
to bool optimize_xor.

+ if (trans_option == 1)
+   {
+ if (try_transfer_range_tests_1 (opcode, i, j, type, lowi, lowj,
+ highi, highj, ops, ranges))
+   {
+ any_changes = true;
+ break;
+   }
+   }
+ else if (trans_option == 2)
+   {
+ if (try_transfer_range_tests_2 (opcode, i, j, type, lowi, lowj,
+ highi, highj, ops, ranges))
+   {
+ any_changes = true;
+ break;
+   }
+   }

I'd prefer
if (optimize_xor)
  any_changes
= optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
highj, ops, ranges + i, ranges + j);
else
  any_changes
= optimize_range_tests_xor (opcode, type, lowi, lowj, highi,
highj, ops, ranges + i, ranges + j);
if (any_changes)
  break;

Ok with those changes.

Jakub


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-12 Thread Jakub Jelinek
On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote:
 As you had mentioned, the transition in this patch does not reduce
 instructions. But the preexisting optimization does. So we prefer to do the
 preexisting optimization first. E.g.
 
 X == 10 || X == 12 || X == 26

Ok, that makes sense.

@@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode,
}
 }
 
+  /* Optimize X == CST1 || X == CST2
+ if popcount (CST2 - CST1) == 1 into
+ ((X - CST1)  ~(CST2 - CST1)) == 0.  */

Mention here also similarly to the other comment that it works also
with ranges.

+  if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) = 2)
+for (i = first; i  length; i++)
+  {
+   tree lowi, highi, lowj, highj, type, tem1, tem2, mask;
+
+   if (ranges[i].exp == NULL_TREE || ranges[i].in_p)
+ continue;
+   type = TREE_TYPE (ranges[i].exp);
+   if (!INTEGRAL_TYPE_P (type))
+ continue;
+   lowi = ranges[i].low;
+   if (lowi == NULL_TREE)
+ continue;

The other loop has here:
  if (lowi == NULL_TREE)
lowi = TYPE_MIN_VALUE (type);
instead, which is better, can you please change it?

+   highi = ranges[i].high;
+   if (highi == NULL_TREE)
+ continue;
+   for (j = i + 1; j  length  j  i + 64; j++)
+ {
+   lowj = ranges[j].low;
+   if (lowj == NULL_TREE)
+ continue;
+   highj = ranges[j].high;
+   if (highj == NULL_TREE)
+ continue;

The other loop has
if (highj == NULL_TREE)
  highj = TYPE_MAX_VALUE (type);
here instead.

+   if (ranges[j].exp == NULL_TREE || ranges[j].in_p
+   || (ranges[i].exp != ranges[j].exp))
+ continue;

Can you please move this test the lowj = assignment, and
remove the ranges[j].exp == NULL_TREE test (both here and
in the earlier loop)?  I mean, we've checked at the
beginning of for (i = first; i  length; i++) loop that
ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE,
it will be different than ranges[i].exp.
So
if (ranges[i].exp != ranges[j].exp || ranges[j].in_p)
  continue;
(and please also collapse the two checks in the first loop,
so that both look similar).

+   /* Check lowj  highi.  */
+   tem1 = fold_binary (GT_EXPR, boolean_type_node,
+  lowj, highi);
+   if (tem1 == NULL_TREE || !integer_onep (tem1))
+ continue;
+   /* Check highi - lowi == highj - lowj.  */
+   tem1 = fold_binary (MINUS_EXPR, type, highi, lowi);
+   if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
+ continue;
+   tem2 = fold_binary (MINUS_EXPR, type, highj, lowj);
+   if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST)
+ continue;
+   if (!tree_int_cst_equal (tem1, tem2))
+ continue;
+   /* Check popcount (lowj - lowi) == 1.  */
+   tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi);
+   if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
+ continue;
+   if (tree_log2 (tem1)  0)
+ continue;
+   mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
+   tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi);
+   tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
+   lowi = build_int_cst (type, 0);

Please use lowj instead of lowi here.  Because, if update_range_test
fails, we continue looking for further matches in following ranges,
and while lowj will be computed again, lowi will be assumed to contain
the low bound of the first range.

+   if (update_range_test (ranges + i, ranges + j, 1, opcode, ops, tem1,
+  ranges[i].in_p, lowi, tem2,

And here too.

Or, alternatively to avoid duplication, you could put the whole
for (i = first; i  length; i++)
loop from the first optimization into another
int i;
  for (pass = 0; pass  (BRANCH_COST (optimize_function_for_speed_p (cfun),
  false) = 2) + 1; pass++)
  {
  }
loop, use whatever is common in the loop unconditionally, and just
conditionalize the rest on pass == 0 vs. pass == 1.
Or maybe even better just move this whole loop into a new function,
with ranges, opcode, first, length arguments plus another (bool?) argument
which of the two optimizations you want to perform, and return the bool
any_changes.  Then you wouldn't run into line length issues that you could
run into with the extra loop.

--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c
@@ -0,0 +1,38 @@
+/* { dg-do run { target { ! m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* 
picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* 
xtensa*-*-*} } } */
+
+/* { dg-options -O2 -fno-inline -fdump-tree-reassoc1-details } */
+/* { dg-additional-options -mbranch-cost=2 { target avr-*-* } } */
+
+int 

RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-11 Thread Zhenqiang Chen


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Thursday, October 10, 2013 7:13 PM
 To: Zhenqiang Chen
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
 CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0
 
 On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote:
  ChangeLog
  2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
  * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
  X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
  ((X - CST1)  ~(CST2 - CST1)) == 0.
 
  testsuite/ChangeLog
  2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
  * gcc.dg/reassoc1.c: New testcase.
 
 +  /* Optimize X == CST1 || X == CST2
 + if popcount (CST2 - CST1) == 1 into
 + ((X - CST1)  ~(CST2 - CST1)) == 0.  */  if (!any_changes 
 + (opcode == BIT_IOR_EXPR))
 
 Wouldn't it be better to put it into the same loop that handles the other
 merges, rather than separate?  

As you had mentioned, the transition in this patch does not reduce
instructions. But the preexisting optimization does. So we prefer to do the
preexisting optimization first. E.g.

X == 10 || X == 12 || X == 26

The preexisting one will optimize X == 10 || X == 26. And the patch will
transfer X == 10 || X == 12.

So I prefer to separate them.

 Why the !any_changes guard?  Why opcode
 == BIT_IOR_EXPR guard?  Can't you optimize X != CST1  X != CST2 if
 popcount (CST2 - CST1) == 1 similarly into ((X - CST1)  ~(CST2 - CST1))
!= 0?

They are not necessary. Patch is updated to check only the BRANCH_COST.
 
 Also, like the preexisting optimization has been generalized for ranges,
can't
 you generalize this one too?
 
 I mean if you have say:
 (X - 43U) = 3U || (X - 75U) = 3U
 (originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X
== 46 ||
 X == 75 || X == 45) where popcount (75U - 43U) == 1, can't this be:
 ((X - 43U)  ~(75U - 43U)) = 3U
 
 For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2] the
 requirements for this optimization would be
 1) LOWCST2  HIGHCST1
 2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2
 3) popcount (LOWCST2 - LOWCST1) == 1
 
 1) and 2) are the same requirements for the other optimization, while 3)
 would be specific to this and would be used only if popcount (LOWCST1 ^
 LOWCST2) != 1.
 Because of 1), LOWCST2 - LOWCST1 should be bigger than
 HIGHCST1 - LOWCST1 (i.e. the value we = against in the end), thus IMHO it
 should work fine even after generalization.

Patch is updated.

  +if (tree_log2 (tem)  0)
 +   continue;
 
 This is I guess cheaper than what I was doing there:
   gcc_checking_assert (!integer_zerop (lowxor));
   tem = fold_binary (MINUS_EXPR, type, lowxor,
  build_int_cst (type, 1));
   if (tem == NULL_TREE)
 continue;
   tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem);
   if (tem == NULL_TREE || !integer_zerop (tem))
 continue;
 to check for popcount (x) == 1.  Can you replace it even in the
preexisting
 code?
   if (tree_log2 (lowxor)  0)
 continue;

Updated.

 Of course, if the two optimizations are merged into the same loop, then
 some of the continue will need to be replaced by just conditional code
inside
 of the loop, handling the two different optimizations.
 
 Thanks for working on this.
 
   Jakub

Bootstrap and no make check regression on X86-64 and ARM.

Thanks!
-Zhenqiang

reassoc-new.patch
Description: Binary data


RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-11 Thread Zhenqiang Chen


 -Original Message-
 From: Jeff Law [mailto:l...@redhat.com]
 Sent: Friday, October 11, 2013 1:20 PM
 To: Zhenqiang Chen
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
 CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0
 
 On 10/10/13 03:25, Zhenqiang Chen wrote:
 
 
  It comes from Coremark. The code is:
 
  if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')
 I should have guessed ;-)
 
 
 
  For ARM, there are three instructions rather than 4 (in thumb state).
  For some older gcc, I got performance improvement on ARM chromebook.
  But I can not reproduce the performance gain now.
 
  I will collect logs during GCC bootstrap.
 Thanks.  It doesn't have to be anything particularly complex.  When I'm
 looking at a transformation I usually just put a printf when it triggers
and grep
 for the string in a make log.

Thanks. I check the make log. There are 1906 occurrences.
 
 Depending on what I'm doing, I may dig more deeply into the situations
 where its triggering to make sure I fully understand the primary and
 secondary effects of a transformation which often leads to tweaking the
 implementation.  I don't think that level of rigor is needed here.
 
 Jeff
 






Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Richard Biener
On Thu, Oct 10, 2013 at 5:04 AM, Jeff Law l...@redhat.com wrote:
 On 08/05/13 02:08, Zhenqiang Chen wrote:

 Hi

 The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) ==
 1
 into ((X - CST1)  ~(CST2 - CST1)) == 0.

 Bootstrap on x86-64 and ARM chromebook.
 No make check regression on x86-64 and panda board.

 For some targets/options, the (X == CST1 || X == CST2) might be
 converted
 to if (x == CST1) else if (x == CST2) at the beginning. In such case,
 the
 patch will not be executed. It is hard to write a reliable testcase to
 detect the transformation. So the testcase does not check the dump logs
 from
 reassoc1 pass. It just checks the runtime result.

 Is it OK for trunk?

 Thanks!
 -Zhenqiang

 ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

 * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
 X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
 ((X - CST1)  ~(CST2 - CST1)) == 0.

 testsuite/ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

 * gcc.dg/reassoc1.c: New testcase.

 This is an interesting transformation.  I suspect we'd want to gate it on
 something like BRANCH_COST.  For a related example, see how we handle (a !=
 0) || (b != 0) - (a | b) != 0 in fold-const.c.  I suspect the conditions
 under which we want to do your transformation are going to be similar if not
 the same as those transformations in fold-const.c.

 Note I've been suggesting the bits I'm referring to in fold-const.c move out
 into the tree-ssa optimizers.  If they fit well into tree-ssa-reassoc.c I'd
 look favorably upon a patch which moved them.

In particular I strongly vote against putting more target dependent
transformations into fold-const.c.

Richard.

 As far as testing, the way to go will be to look at the reassoc1 dumps, but
 skip the test on targets with the wrong branch cost. tree-ssa/vrp87.c has
 a suitable condition to filter out the test on targets were the branch cost
 is too small.

 Out of curiosity, what prompted you to make this transformation?  Was it
 mostly to deal with a codesize problem or is it a significant win on some
 hot code in a benchmark, or something else?  Closely related, have you done
 anything like instrument the transformation to see how often it applies
 during a GCC bootstrap?

 jeff




RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Zhenqiang Chen


 -Original Message-
 From: Jeff Law [mailto:l...@redhat.com]
 Sent: Thursday, October 10, 2013 11:05 AM
 To: Zhenqiang Chen; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
 CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0
 
 On 08/05/13 02:08, Zhenqiang Chen wrote:
  Hi
 
  The patch reassociates X == CST1 || X == CST2 if popcount (CST2 -
  CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0.
 
  Bootstrap on x86-64 and ARM chromebook.
  No make check regression on x86-64 and panda board.
 
  For some targets/options, the (X == CST1 || X == CST2) might be
  converted to if (x == CST1) else if (x == CST2) at the beginning. In
  such case, the patch will not be executed. It is hard to write a
  reliable testcase to detect the transformation. So the testcase does
  not check the dump logs from
  reassoc1 pass. It just checks the runtime result.
 
  Is it OK for trunk?
 
  Thanks!
  -Zhenqiang
 
  ChangeLog
  2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
  * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
  X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
  ((X - CST1)  ~(CST2 - CST1)) == 0.
 
  testsuite/ChangeLog
  2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
  * gcc.dg/reassoc1.c: New testcase.
 This is an interesting transformation.  I suspect we'd want to gate it on
 something like BRANCH_COST.  For a related example, see how we handle
 (a != 0) || (b != 0) - (a | b) != 0 in fold-const.c.  I suspect the
conditions
 under which we want to do your transformation are going to be similar if
not
 the same as those transformations in fold-const.c.

Thank you for the comments.

I will update the patch to check BRANCH_COST.

 Note I've been suggesting the bits I'm referring to in fold-const.c move
out
 into the tree-ssa optimizers.  If they fit well into tree-ssa-reassoc.c
I'd look
 favorably upon a patch which moved them.

The code is similar with the code (in tree-ssa-reassoc.c) for 
 Optimize X == CST1 || X == CST2
 if popcount (CST1 ^ CST2) == 1 into
 (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2))
 
 As far as testing, the way to go will be to look at the reassoc1 dumps,
but skip
 the test on targets with the wrong branch cost.
 tree-ssa/vrp87.c has a suitable condition to filter out the test on
targets were
 the branch cost is too small.

I will update the test case.

 Out of curiosity, what prompted you to make this transformation?  Was it
 mostly to deal with a codesize problem or is it a significant win on some
hot
 code in a benchmark, or something else?  Closely related, have you done
 anything like instrument the transformation to see how often it applies
 during a GCC bootstrap?
 
It comes from Coremark. The code is:

if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')

For ARM, there are three instructions rather than 4 (in thumb state).
For some older gcc, I got performance improvement on ARM chromebook. But I
can not reproduce the performance gain now.

I will collect logs during GCC bootstrap.

Thanks!
-Zhenqiang





Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Jakub Jelinek
On Thu, Oct 10, 2013 at 05:25:01PM +0800, Zhenqiang Chen wrote:
  Note I've been suggesting the bits I'm referring to in fold-const.c move
 out
  into the tree-ssa optimizers.  If they fit well into tree-ssa-reassoc.c
 I'd look
  favorably upon a patch which moved them.
 
 The code is similar with the code (in tree-ssa-reassoc.c) for 
  Optimize X == CST1 || X == CST2
  if popcount (CST1 ^ CST2) == 1 into
  (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2))

Yeah.  Though, that is one operation cheaper than this, the above
is replacing two comparisons with constants plus one || with
one arithmetic operation (with constant) plus one comparison of constant,
I think that should be always a win (ok, it is one arithmetic operation
more expensive if X == CST1 all the time, but otherwise it is likely
cheaper).  While in your case it is two arithmetic operations plus
comparison with constant, so perhaps for very cheap BRANCH_COST it might
not be beneficial.

Jakub


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Jakub Jelinek
On Mon, Aug 05, 2013 at 01:39:50AM -0700, Andrew Pinski wrote:
 Seems like a better place to put this is inside tree-ssa-ifcombine.c
 which handles the case where if(a || b) is split up into if(a) else
 if(b).
 Moving it into tree-ssa-ifcombine.c allows for code to be optimized
 which was written using the if(a) else if(b) form.

No, tree-ssa-reassoc.c now handles both intra-bb and inter-bb range test
optimizations.

Jakub


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Jakub Jelinek
On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote:
 ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
   * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
   X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
   ((X - CST1)  ~(CST2 - CST1)) == 0.
 
 testsuite/ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
   * gcc.dg/reassoc1.c: New testcase.

+  /* Optimize X == CST1 || X == CST2
+ if popcount (CST2 - CST1) == 1 into
+ ((X - CST1)  ~(CST2 - CST1)) == 0.  */
+  if (!any_changes  (opcode == BIT_IOR_EXPR))

Wouldn't it be better to put it into the same loop that handles
the other merges, rather than separate?  Why the !any_changes
guard?  Why opcode == BIT_IOR_EXPR guard?  Can't you optimize
X != CST1  X != CST2 if popcount (CST2 - CST1) == 1
similarly into ((X - CST1)  ~(CST2 - CST1)) != 0?

Also, like the preexisting optimization has been generalized for ranges,
can't you generalize this one too?

I mean if you have say:
(X - 43U) = 3U || (X - 75U) = 3U
(originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 
|| X == 75 || X == 45)
where popcount (75U - 43U) == 1, can't this be:
((X - 43U)  ~(75U - 43U)) = 3U

For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2]
the requirements for this optimization would be
1) LOWCST2  HIGHCST1
2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2
3) popcount (LOWCST2 - LOWCST1) == 1

1) and 2) are the same requirements for the other
optimization, while 3) would be specific to this
and would be used only if popcount (LOWCST1 ^ LOWCST2) != 1.
Because of 1), LOWCST2 - LOWCST1 should be bigger than
HIGHCST1 - LOWCST1 (i.e. the value we = against in the end),
thus IMHO it should work fine even after generalization.

+   if (tree_log2 (tem)  0)
+ continue;

This is I guess cheaper than what I was doing there:
  gcc_checking_assert (!integer_zerop (lowxor));
  tem = fold_binary (MINUS_EXPR, type, lowxor,
 build_int_cst (type, 1));
  if (tem == NULL_TREE)
continue;
  tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem);
  if (tem == NULL_TREE || !integer_zerop (tem))
continue;
to check for popcount (x) == 1.  Can you replace it even in the
preexisting code?
if (tree_log2 (lowxor)  0)
  continue;
?
Of course, if the two optimizations are merged into the same loop,
then some of the continue will need to be replaced by just conditional
code inside of the loop, handling the two different optimizations.

Thanks for working on this.

Jakub


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Jeff Law

On 10/10/13 03:25, Zhenqiang Chen wrote:



It comes from Coremark. The code is:

if (NEXT_SYMBOL == '+' || NEXT_SYMBOL == '-')

I should have guessed ;-)




For ARM, there are three instructions rather than 4 (in thumb state).
For some older gcc, I got performance improvement on ARM chromebook. But I
can not reproduce the performance gain now.

I will collect logs during GCC bootstrap.
Thanks.  It doesn't have to be anything particularly complex.  When I'm 
looking at a transformation I usually just put a printf when it triggers 
and grep for the string in a make log.


Depending on what I'm doing, I may dig more deeply into the situations 
where its triggering to make sure I fully understand the primary and 
secondary effects of a transformation which often leads to tweaking the 
implementation.  I don't think that level of rigor is needed here.


Jeff



Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Jeff Law

On 10/10/13 04:10, Jakub Jelinek wrote:

On Thu, Oct 10, 2013 at 05:25:01PM +0800, Zhenqiang Chen wrote:

Note I've been suggesting the bits I'm referring to in fold-const.c move

out

into the tree-ssa optimizers.  If they fit well into tree-ssa-reassoc.c

I'd look

favorably upon a patch which moved them.


The code is similar with the code (in tree-ssa-reassoc.c) for
  Optimize X == CST1 || X == CST2
  if popcount (CST1 ^ CST2) == 1 into
  (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2))


Yeah.  Though, that is one operation cheaper than this, the above
is replacing two comparisons with constants plus one || with
one arithmetic operation (with constant) plus one comparison of constant,
I think that should be always a win (ok, it is one arithmetic operation
more expensive if X == CST1 all the time, but otherwise it is likely
cheaper).  While in your case it is two arithmetic operations plus
comparison with constant, so perhaps for very cheap BRANCH_COST it might
not be beneficial.
Wow, I had no idea we had that kind of transformation in tree-reassoc 
already.


Jeff


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-10 Thread Jeff Law

On 10/10/13 05:12, Jakub Jelinek wrote:

On Mon, Aug 05, 2013 at 04:08:58PM +0800, Zhenqiang Chen wrote:

ChangeLog
2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

* tree-ssa-reassoc.c (optimize_range_tests): Reasociate
X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
((X - CST1)  ~(CST2 - CST1)) == 0.

testsuite/ChangeLog
2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

* gcc.dg/reassoc1.c: New testcase.


+  /* Optimize X == CST1 || X == CST2
+ if popcount (CST2 - CST1) == 1 into
+ ((X - CST1)  ~(CST2 - CST1)) == 0.  */
+  if (!any_changes  (opcode == BIT_IOR_EXPR))

Wouldn't it be better to put it into the same loop that handles
the other merges, rather than separate?  Why the !any_changes
guard?  Why opcode == BIT_IOR_EXPR guard?  Can't you optimize
X != CST1  X != CST2 if popcount (CST2 - CST1) == 1
similarly into ((X - CST1)  ~(CST2 - CST1)) != 0?

Also, like the preexisting optimization has been generalized for ranges,
can't you generalize this one too?

I mean if you have say:
(X - 43U) = 3U || (X - 75U) = 3U
(originally e.g. X == 43 || X == 76 || X == 44 || X == 78 || X == 77 || X == 46 
|| X == 75 || X == 45)
where popcount (75U - 43U) == 1, can't this be:
((X - 43U)  ~(75U - 43U)) = 3U

For X in [LOWCST1, HIGHCST1] || X in [LOWCST2, HIGHCST2]
the requirements for this optimization would be
1) LOWCST2  HIGHCST1
2) HIGHCST1 - LOWCST1 == HIGHCST2 - LOWCST2
3) popcount (LOWCST2 - LOWCST1) == 1

1) and 2) are the same requirements for the other
optimization, while 3) would be specific to this
and would be used only if popcount (LOWCST1 ^ LOWCST2) != 1.
Because of 1), LOWCST2 - LOWCST1 should be bigger than
HIGHCST1 - LOWCST1 (i.e. the value we = against in the end),
thus IMHO it should work fine even after generalization.

+   if (tree_log2 (tem)  0)
+ continue;

This is I guess cheaper than what I was doing there:
   gcc_checking_assert (!integer_zerop (lowxor));
   tem = fold_binary (MINUS_EXPR, type, lowxor,
  build_int_cst (type, 1));
   if (tem == NULL_TREE)
 continue;
   tem = fold_binary (BIT_AND_EXPR, type, lowxor, tem);
   if (tem == NULL_TREE || !integer_zerop (tem))
 continue;
to check for popcount (x) == 1.  Can you replace it even in the
preexisting code?
if (tree_log2 (lowxor)  0)
  continue;
?
Of course, if the two optimizations are merged into the same loop,
then some of the continue will need to be replaced by just conditional
code inside of the loop, handling the two different optimizations.

Thanks for working on this.
Clearly you should be the one reviewing this patch Jakub ;-)  I'm 
handing it off to you.



jeff


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-10-09 Thread Jeff Law

On 08/05/13 02:08, Zhenqiang Chen wrote:

Hi

The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1
into ((X - CST1)  ~(CST2 - CST1)) == 0.

Bootstrap on x86-64 and ARM chromebook.
No make check regression on x86-64 and panda board.

For some targets/options, the (X == CST1 || X == CST2) might be converted
to if (x == CST1) else if (x == CST2) at the beginning. In such case, the
patch will not be executed. It is hard to write a reliable testcase to
detect the transformation. So the testcase does not check the dump logs from
reassoc1 pass. It just checks the runtime result.

Is it OK for trunk?

Thanks!
-Zhenqiang

ChangeLog
2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

* tree-ssa-reassoc.c (optimize_range_tests): Reasociate
X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
((X - CST1)  ~(CST2 - CST1)) == 0.

testsuite/ChangeLog
2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

* gcc.dg/reassoc1.c: New testcase.
This is an interesting transformation.  I suspect we'd want to gate it 
on something like BRANCH_COST.  For a related example, see how we handle 
(a != 0) || (b != 0) - (a | b) != 0 in fold-const.c.  I suspect the 
conditions under which we want to do your transformation are going to be 
similar if not the same as those transformations in fold-const.c.


Note I've been suggesting the bits I'm referring to in fold-const.c move 
out into the tree-ssa optimizers.  If they fit well into 
tree-ssa-reassoc.c I'd look favorably upon a patch which moved them.


As far as testing, the way to go will be to look at the reassoc1 dumps, 
but skip the test on targets with the wrong branch cost. 
tree-ssa/vrp87.c has a suitable condition to filter out the test on 
targets were the branch cost is too small.


Out of curiosity, what prompted you to make this transformation?  Was it 
mostly to deal with a codesize problem or is it a significant win on 
some hot code in a benchmark, or something else?  Closely related, have 
you done anything like instrument the transformation to see how often it 
applies during a GCC bootstrap?


jeff




RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-08-06 Thread Zhenqiang Chen

 -Original Message-
 From: Andrew Pinski [mailto:pins...@gmail.com]
 Sent: Monday, August 05, 2013 4:40 PM
 To: Zhenqiang Chen
 Cc: GCC Patches
 Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 -
 CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0
 
 On Mon, Aug 5, 2013 at 1:08 AM, Zhenqiang Chen
 zhenqiang.c...@arm.com wrote:
  Hi
 
  The patch reassociates X == CST1 || X == CST2 if popcount (CST2 -
  CST1) == 1 into ((X - CST1)  ~(CST2 - CST1)) == 0.
 
  Bootstrap on x86-64 and ARM chromebook.
  No make check regression on x86-64 and panda board.
 
  For some targets/options, the (X == CST1 || X == CST2) might be
  converted to if (x == CST1) else if (x == CST2) at the beginning. In
  such case, the patch will not be executed. It is hard to write a
  reliable testcase to detect the transformation. So the testcase does
  not check the dump logs from
  reassoc1 pass. It just checks the runtime result.
 
  Is it OK for trunk?
 Seems like a better place to put this is inside tree-ssa-ifcombine.c which

The patch is similar with the  code for 
 Optimize X == CST1 || X == CST2
 if popcount (CST1 ^ CST2) == 1 into
 (X  ~(CST1 ^ CST2)) == (CST1  ~(CST1 ^ CST2))

Your RFC: Gimple combine/folding interface is a good design.
But before it, I think reusing code in tree-ssa-reassoc.c is the most easy way 
to implement it.

 handles the case where if(a || b) is split up into if(a) else if(b).
 Moving it into tree-ssa-ifcombine.c allows for code to be optimized which
 was written using the if(a) else if(b) form.

Yes. There might have opportunities to optimized if(a) else if(b) form for 
targets which LOGICAL_OP_NON_SHORT_CIRCUIT is FALSE.

If LOGICAL_OP_NON_SHORT_CIRCUIT is TRUE, a || b is converted to a | b in 
fold-const.c if they are simple operands.

Both reassoc and ifcombine optimize short circuiting. But they handle different 
scenarios. So I think we need both.

As you had mentioned, we need  enhance ifcombine to optimize if(a) else if(b) 
form using the similar methods from reassoc pass.
 
 Then it would easier to detect this for all targets and easier to remove from
 fold-const.c the removal of the short circuiting.

A unified interface as your RFC will be helpful for fold-const.c, 
tree-ssa-ifcombine.c and tree-ssa-reassoc.c.

If we can detect the optimization opportunity in fold_truth_andor, we do not 
need to split if(a || b) into if(a) else if(b) at all.

Thanks!
-Zhenqiang

  ChangeLog
  2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
  * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
  X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
  ((X - CST1)  ~(CST2 - CST1)) == 0.
 
  testsuite/ChangeLog
  2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com
 
  * gcc.dg/reassoc1.c: New testcase.






Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-08-05 Thread Andrew Pinski
On Mon, Aug 5, 2013 at 1:08 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote:
 Hi

 The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1
 into ((X - CST1)  ~(CST2 - CST1)) == 0.

 Bootstrap on x86-64 and ARM chromebook.
 No make check regression on x86-64 and panda board.

 For some targets/options, the (X == CST1 || X == CST2) might be converted
 to if (x == CST1) else if (x == CST2) at the beginning. In such case, the
 patch will not be executed. It is hard to write a reliable testcase to
 detect the transformation. So the testcase does not check the dump logs from
 reassoc1 pass. It just checks the runtime result.

 Is it OK for trunk?

Seems like a better place to put this is inside tree-ssa-ifcombine.c
which handles the case where if(a || b) is split up into if(a) else
if(b).
Moving it into tree-ssa-ifcombine.c allows for code to be optimized
which was written using the if(a) else if(b) form.

Then it would easier to detect this for all targets and easier to
remove from fold-const.c the removal of the short circuiting.

Thanks,
Andrew Pinski


 Thanks!
 -Zhenqiang

 ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

 * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
 X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
 ((X - CST1)  ~(CST2 - CST1)) == 0.

 testsuite/ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

 * gcc.dg/reassoc1.c: New testcase.


Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0

2013-08-05 Thread Andrew Pinski
On Mon, Aug 5, 2013 at 1:39 AM, Andrew Pinski pins...@gmail.com wrote:
 On Mon, Aug 5, 2013 at 1:08 AM, Zhenqiang Chen zhenqiang.c...@arm.com wrote:
 Hi

 The patch reassociates X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1
 into ((X - CST1)  ~(CST2 - CST1)) == 0.

 Bootstrap on x86-64 and ARM chromebook.
 No make check regression on x86-64 and panda board.

 For some targets/options, the (X == CST1 || X == CST2) might be converted
 to if (x == CST1) else if (x == CST2) at the beginning. In such case, the
 patch will not be executed. It is hard to write a reliable testcase to
 detect the transformation. So the testcase does not check the dump logs from
 reassoc1 pass. It just checks the runtime result.

 Is it OK for trunk?

 Seems like a better place to put this is inside tree-ssa-ifcombine.c
 which handles the case where if(a || b) is split up into if(a) else
 if(b).
 Moving it into tree-ssa-ifcombine.c allows for code to be optimized
 which was written using the if(a) else if(b) form.

 Then it would easier to detect this for all targets and easier to
 remove from fold-const.c the removal of the short circuiting.

The other place where it make sense to do this optimization is the
gimple combiner (right now that is really tree-ssa-forwprop.c).
Though it would better sense if reassoc and ifcombine uses the same
functions as the combiner (see my RFC which I don't have much time to
work on right now).

Thanks,
Andrew Pinski



 Thanks,
 Andrew Pinski


 Thanks!
 -Zhenqiang

 ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

 * tree-ssa-reassoc.c (optimize_range_tests): Reasociate
 X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into
 ((X - CST1)  ~(CST2 - CST1)) == 0.

 testsuite/ChangeLog
 2013-08-05  Zhenqiang Chen  zhenqiang.c...@arm.com

 * gcc.dg/reassoc1.c: New testcase.