Re: [PATCH] Improved handling of shifts/rotates in bit CCP.

2021-08-24 Thread Richard Biener via Gcc-patches
On Sun, Aug 22, 2021 at 4:03 PM Roger Sayle  wrote:
>
>
>
> This patch is the next in the series to improve bit bounds in tree-ssa's
>
> bit CCP pass, this time: bounds for shifts and rotates by unknown amounts.
>
> This allows us to optimize expressions such as ((x&15)<<(y&24))&64.
>
> In this case, the expression (y&24) contains only two unknown bits,
>
> and can therefore have only four possible values: 0, 8, 16 and 24.
>
> From this (x&15)<<(y&24) has the nonzero bits 0x0f0f0f0f, and from
>
> that ((x&15)<<(y&24))&64 must always be zero.
>
>
>
> One clever use of computer science in this patch is the use of XOR
>
> to efficiently enumerate bit patterns in Gray code order.  As the
>
> order in which we generate values is not significant, it's faster
>
> and more convenient to enumerate values by flipping one bit at a
>
> time, rather than in numerical order [which would require carry
>
> bits and additional logic].
>
>
>
> There's a pre-existing ??? comment in tree-ssa-ccp.c that we should
>
> eventually be able to optimize (x<<(y|8))&255, but this patch takes the
>
> conservatively paranoid approach of only optimizing cases where the
>
> shift/rotate is guaranteed to be less than the target precision, and
>
> therefore avoids changing any cases that potentially might invoke
>
> undefined behavior.  This patch does optimize (x<<((y&31)|8))&255.
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
>
> and "make -k check" with no new failures.  OK for mainline?

OK.

Thanks,
Richard.

>
>
>
>
> 2021-08-22  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> * tree-ssa-ccp.c (get_individual_bits): Helper function to
>
> extract the individual bits from a widest_int constant (mask).
>
> (gray_code_bit_flips): New read-only table for effiently
>
> enumerating permutations/combinations of bits.
>
> (bit_value_binop) [LROTATE_EXPR, RROTATE_EXPR]: Handle rotates
>
> by unknown counts that are guaranteed less than the target
>
> precision and four or fewer unknown bits by enumeration.
>
> [LSHIFT_EXPR, RSHIFT_EXPR]: Likewise, also handle shifts by
>
> enumeration under the same conditions.  Handle remaining
>
> shifts as a mask based upon the minimum possible shift value.
>
>
>
> gcc/testsuite/ChangeLog
>
> * gcc.dg/tree-ssa/ssa-ccp-41.c: New test case.
>
>
>
>
>
> Roger
>
> --
>
>
>


[PATCH] Improved handling of shifts/rotates in bit CCP.

2021-08-22 Thread Roger Sayle
 

This patch is the next in the series to improve bit bounds in tree-ssa's

bit CCP pass, this time: bounds for shifts and rotates by unknown amounts.

This allows us to optimize expressions such as ((x&15)<<(y&24))&64.

In this case, the expression (y&24) contains only two unknown bits,

and can therefore have only four possible values: 0, 8, 16 and 24.

>From this (x&15)<<(y&24) has the nonzero bits 0x0f0f0f0f, and from

that ((x&15)<<(y&24))&64 must always be zero.

 

One clever use of computer science in this patch is the use of XOR

to efficiently enumerate bit patterns in Gray code order.  As the

order in which we generate values is not significant, it's faster

and more convenient to enumerate values by flipping one bit at a

time, rather than in numerical order [which would require carry

bits and additional logic].

 

There's a pre-existing ??? comment in tree-ssa-ccp.c that we should

eventually be able to optimize (x<<(y|8))&255, but this patch takes the

conservatively paranoid approach of only optimizing cases where the

shift/rotate is guaranteed to be less than the target precision, and

therefore avoids changing any cases that potentially might invoke

undefined behavior.  This patch does optimize (x<<((y&31)|8))&255.

 

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"

and "make -k check" with no new failures.  OK for mainline?

 

 

2021-08-22  Roger Sayle  

 

gcc/ChangeLog

* tree-ssa-ccp.c (get_individual_bits): Helper function to

extract the individual bits from a widest_int constant (mask).

(gray_code_bit_flips): New read-only table for effiently

enumerating permutations/combinations of bits.

(bit_value_binop) [LROTATE_EXPR, RROTATE_EXPR]: Handle rotates

by unknown counts that are guaranteed less than the target

precision and four or fewer unknown bits by enumeration.

[LSHIFT_EXPR, RSHIFT_EXPR]: Likewise, also handle shifts by

enumeration under the same conditions.  Handle remaining

shifts as a mask based upon the minimum possible shift value.

 

gcc/testsuite/ChangeLog

* gcc.dg/tree-ssa/ssa-ccp-41.c: New test case.

 

 

Roger

--

 

diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 1a63ae5..927a0aa 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -1448,6 +1448,34 @@ bit_value_mult_const (signop sgn, int width,
   *mask = wi::ext (sum_mask, width, sgn);
 }
 
+/* Fill up to MAX values in the BITS array with values representing
+   each of the non-zero bits in the value X.  Returns the number of
+   bits in X (capped at the maximum value MAX).  For example, an X
+   value 11, places 1, 2 and 8 in BITS and returns the value 3.  */
+
+unsigned int
+get_individual_bits (widest_int *bits, widest_int x, unsigned int max)
+{
+  unsigned int count = 0;
+  while (count < max && x != 0)
+{
+  int bitpos = wi::ctz (x);
+  bits[count] = wi::lshift (1, bitpos);
+  x ^= bits[count];
+  count++;
+}
+  return count;
+}
+
+/* Array of 2^N - 1 values representing the bits flipped between
+   consecutive Gray codes.  This is used to efficiently enumerate
+   all permutations on N bits using XOR.  */
+static const unsigned char gray_code_bit_flips[63] = {
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 5,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, 4,
+  0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0
+};
 
 /* Apply the operation CODE in type TYPE to the value, mask pairs
R1VAL, R1MASK and R2VAL, R2MASK representing a values of type R1TYPE
@@ -1525,6 +1553,48 @@ bit_value_binop (enum tree_code code, signop sgn, int 
width,
}
}
}
+  else if (wi::ltu_p (r2val | r2mask, width)
+  && wi::popcount (r2mask) <= 4)
+   {
+ widest_int bits[4];
+ widest_int res_val, res_mask;
+ widest_int tmp_val, tmp_mask;
+ widest_int shift = wi::bit_and_not (r2val, r2mask);
+ unsigned int bit_count = get_individual_bits (bits, r2mask, 4);
+ unsigned int count = (1 << bit_count) - 1;
+
+ /* Initialize result to rotate by smallest value of shift.  */
+ if (code == RROTATE_EXPR)
+   {
+ res_mask = wi::rrotate (r1mask, shift, width);
+ res_val = wi::rrotate (r1val, shift, width);
+   }
+ else
+   {
+ res_mask = wi::lrotate (r1mask, shift, width);
+ res_val = wi::lrotate (r1val, shift, width);
+   }
+
+ /* Iterate through the remaining values of shift.  */
+ for (unsigned int i=0; i (width, false);
+ tmp <<= wi::ctz (r1val | r1mask);
+ tmp <<= wi::bit_and_not (r2val, r2mask);
+ *mask = wi::ext (tmp, width, sgn);
+ *val = 0;
+   }
+ else if (!wi::neg_p (r1val | r1mask, sgn))
+   {
+ /* Logical right shift, or zero sign bit.  */
+ widest_int arg = r1val | r1mask;
+