RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

Zhenqiang Chen Mon, 14 Oct 2013 00:11:45 -0700

Patch and test cases are updated according to your comments.

Bootstrap and no make check regression on X86-64.


ChangeLog:
2013-10-14  Zhenqiang Chen  <zhenqiang.c...@arm.com>

        * tree-ssa-reassoc.c (try_transfer_range_tests_1): New function,
        extracted from optimize_range_tests
        * (try_transfer_range_tests_2): New function.
        * (try_transfer_range_tests): New function, extracted from
        optimize_range_tests and calls try_transfer_range_tests_1 and
        try_transfer_range_tests_2,
        * (optimize_range_tests): Use try_transfer_range_tests.

testsuite/ChangeLog:
2013-10-14  Zhenqiang Chen  <zhenqiang.c...@arm.com>

        * gcc.dg/tree-ssa/reassoc-32.c: New test case.
        * gcc.dg/tree-ssa/reassoc-33.c: New test case.
        * gcc.dg/tree-ssa/reassoc-34.c: New test case.
        * gcc.dg/tree-ssa/reassoc-35.c: New test case.
        * gcc.dg/tree-ssa/reassoc-36.c: New test case.

> -----Original Message-----
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Saturday, October 12, 2013 3:40 PM
> To: Zhenqiang Chen
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2
-
> CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0
> 
> On Sat, Oct 12, 2013 at 10:08:12AM +0800, Zhenqiang Chen wrote:
> > As you had mentioned, the transition in this patch does not reduce
> > instructions. But the preexisting optimization does. So we prefer to
> > do the preexisting optimization first. E.g.
> >
> > X == 10 || X == 12 || X == 26
> 
> Ok, that makes sense.
> 
> @@ -2279,6 +2275,71 @@ optimize_range_tests (enum tree_code opcode,
>       }
>      }
> 
> +  /* Optimize X == CST1 || X == CST2
> +     if popcount (CST2 - CST1) == 1 into
> +     ((X - CST1) & ~(CST2 - CST1)) == 0.  */
> 
> Mention here also similarly to the other comment that it works also with
> ranges.
> 
> +  if (BRANCH_COST (optimize_function_for_speed_p (cfun), false) >= 2)
> +    for (i = first; i < length; i++)
> +      {
> +     tree lowi, highi, lowj, highj, type, tem1, tem2, mask;
> +
> +     if (ranges[i].exp == NULL_TREE || ranges[i].in_p)
> +       continue;
> +     type = TREE_TYPE (ranges[i].exp);
> +     if (!INTEGRAL_TYPE_P (type))
> +       continue;
> +     lowi = ranges[i].low;
> +     if (lowi == NULL_TREE)
> +       continue;
> 
> The other loop has here:
>       if (lowi == NULL_TREE)
>         lowi = TYPE_MIN_VALUE (type);
> instead, which is better, can you please change it?
> 
> +     highi = ranges[i].high;
> +     if (highi == NULL_TREE)
> +       continue;
> +     for (j = i + 1; j < length && j < i + 64; j++)
> +       {
> +         lowj = ranges[j].low;
> +         if (lowj == NULL_TREE)
> +           continue;
> +         highj = ranges[j].high;
> +         if (highj == NULL_TREE)
> +           continue;
> 
> The other loop has
>       if (highj == NULL_TREE)
>         highj = TYPE_MAX_VALUE (type);
> here instead.
> 
> +         if (ranges[j].exp == NULL_TREE || ranges[j].in_p
> +             || (ranges[i].exp != ranges[j].exp))
> +           continue;
> 
> Can you please move this test the lowj = assignment, and remove the
> ranges[j].exp == NULL_TREE test (both here and in the earlier loop)?  I
mean,
> we've checked at the beginning of for (i = first; i < length; i++) loop
that
> ranges[i].exp is not NULL_TREE, and if ranges[j].exp is NULL_TREE, it will
be
> different than ranges[i].exp.
> So
>       if (ranges[i].exp != ranges[j].exp || ranges[j].in_p)
>         continue;
> (and please also collapse the two checks in the first loop, so that both
look
> similar).
> 
> +         /* Check lowj > highi.  */
> +         tem1 = fold_binary (GT_EXPR, boolean_type_node,
> +                            lowj, highi);
> +         if (tem1 == NULL_TREE || !integer_onep (tem1))
> +           continue;
> +         /* Check highi - lowi == highj - lowj.  */
> +         tem1 = fold_binary (MINUS_EXPR, type, highi, lowi);
> +         if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
> +           continue;
> +         tem2 = fold_binary (MINUS_EXPR, type, highj, lowj);
> +         if (tem2 == NULL_TREE || TREE_CODE (tem2) != INTEGER_CST)
> +           continue;
> +         if (!tree_int_cst_equal (tem1, tem2))
> +           continue;
> +         /* Check popcount (lowj - lowi) == 1.  */
> +         tem1 = fold_binary (MINUS_EXPR, type, lowj, lowi);
> +         if (tem1 == NULL_TREE || TREE_CODE (tem1) != INTEGER_CST)
> +           continue;
> +         if (tree_log2 (tem1) < 0)
> +           continue;
> +         mask = fold_build1 (BIT_NOT_EXPR, type, tem1);
> +         tem1 = fold_binary (MINUS_EXPR, type, ranges[i].exp, lowi);
> +         tem1 = fold_build2 (BIT_AND_EXPR, type, tem1, mask);
> +         lowi = build_int_cst (type, 0);
> 
> Please use lowj instead of lowi here.  Because, if update_range_test
fails, we
> continue looking for further matches in following ranges, and while lowj
will
> be computed again, lowi will be assumed to contain the low bound of the
> first range.
> 
> +         if (update_range_test (ranges + i, ranges + j, 1, opcode, ops,
tem1,
> +                                ranges[i].in_p, lowi, tem2,
> 
> And here too.
> 
> Or, alternatively to avoid duplication, you could put the whole for (i =
first; i <
> length; i++) loop from the first optimization into another int i;
>   for (pass = 0; pass < (BRANCH_COST (optimize_function_for_speed_p
> (cfun),
>                                     false) >= 2) + 1; pass++)
>   {
>   }
> loop, use whatever is common in the loop unconditionally, and just
> conditionalize the rest on pass == 0 vs. pass == 1.
> Or maybe even better just move this whole loop into a new function, with
> ranges, opcode, first, length arguments plus another (bool?) argument
which
> of the two optimizations you want to perform, and return the bool
> any_changes.  Then you wouldn't run into line length issues that you could
> run into with the extra loop.
> 
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-33.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-*
> +v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-*
> +mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */
> +
> +/* { dg-options "-O2 -fno-inline -fdump-tree-reassoc1-details" } */
> +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */
> +
> +int test (int a, int b, int c)
> +{
> +  if (a == 43 || a == 75 || a == 44 || a == 78 || a == 77 || a == 46 || a
== 76 ||
> a == 45)
> +    return b;
> +  else
> +    return c;
> +}
> +
> +int main ()
> +{
> +  if (test (43, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (44, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (45, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (46, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (75, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (76, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (77, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (78, 20, 30) != 20)
> +    __builtin_abort ();
> +  if (test (30, 20, 30) != 30)
> +    __builtin_abort ();
> 
> Perhaps it would be better to test more than just one value outside of the
> range.  Say everything from -10 to 100 might be better, but you'd want to
> make sure the optimization can't be applied to the What about
> 
> int
> main ()
> {
>   volatile int n43, n47, n75, n79;
>   n43 = 43; n47 = n43 + 4; n75 = 75; n79 = n75 + 4;
>   int i;
>   for (i = -10; i <= 100; i++)
>     if (test (i, 20, 30) != 30 - ((i >= n43 && i < n47)
>                                 || (i >= n75 && i < n79)) * 10)
>       __builtin_abort ();
>   return 0;
> }
> 
> ?
> 
>       Jakub

reassoc-updated.patch
Description: Binary data

RE: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) & ~(CST2 - CST1)) == 0

Reply via email to