Thanks for the review!
Andrew Pinski <[email protected]> writes:
> On Wed, Mar 12, 2025 at 12:00 PM Richard Sandiford
> <[email protected]> wrote:
>>
>> Using a combination of rules, we were able to fold
>>
>> ((X >> C1) & C2) * (1 << C1) --> X & (C2 << C1)
>>
>> if everything was done at the same precision, but we couldn't fold
>> it if the AND was done at a different precision. The optimisation is
>> often (but not always) valid for that case too.
>>
>> This patch adds a dedicated rule for the case where different precisions
>> are involved.
>>
>> An alternative would be to extend the individual folds that together
>> handle the same-precision case so that those rules handle differing
>> precisions. But the risk is that that could replace narrow operations
>> with wide operations, which would be especially harmful on targets
>> like avr. It's also not obviously free of cycles.
>>
>> I also wondered whether the converts should be non-optional.
>>
>> gcc/
>> * match.pd: Fold ((X >> C1) & C2) * (1 << C1) to X & (C2 << C1).
>>
>> gcc/testsuite/
>> * gcc.dg/fold-mul-and-lshift-1.c: New test.
>> * gcc.dg/fold-mul-and-lshift-2.c: Likewise.
>> ---
>> gcc/match.pd | 29 ++++++++++
>> gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c | 59 ++++++++++++++++++++
>> gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c | 15 +++++
>> 3 files changed, 103 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-1.c
>> create mode 100644 gcc/testsuite/gcc.dg/fold-mul-and-lshift-2.c
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 5c679848bdf..3197d1cac75 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -5231,6 +5231,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> (if (mask)
>> (bit_op (shift (convert @0) @1) { mask; })))))))
>>
>> +/* Fold ((X >> C1) & C2) * (1 << C1) into X & (C2 << C1), including cases
>> where
>> + the & happens in a different type. It is the conversion case that isn't
>> + a composition of other folds.
>> +
>> + Let the type of the * and >> be T1 and the type of the & be T2.
>> + The fold is valid if the conversion to T2 preserves all information;
>> + that is, if T2 is wider than T1 or drops no more than C1 bits from T1.
>> + In that case, the & might operate on bits that are dropped by the
>> + later conversion to T1 and the multiplication by (1 << C1), but those
>> + bits are also dropped by ANDing with C2 << C1 (converted to T1).
>> +
>> + If the conversion to T2 is not information-preserving, we have to be
>> + careful about the later conversion to T1 acting as a sign extension.
>> + We need either T2 to be unsigned or the top (sign) bit of C2 to be clear.
>> + That is equivalent to testing whether C2 is nonnegative. */
>> +(simplify
>> + (mult
>> + (convert? (bit_and (convert? (rshift @0 INTEGER_CST@1)) INTEGER_CST@2))
>> + INTEGER_CST@3)
>> + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>> + (with { auto prec = element_precision (type); }
> Since we know this needs to be a scalar, Using TREE_PRECISION here is fine
> too.
Yeah, agreed. I'd wondered whether to use TREE_PRECISION instead,
but then I'd also wondered about trying to make the fold work for ectors.
Guess I ended up between two stools.
>> + (if (wi::ltu_p (wi::to_widest (@1), prec))
>
> I think using wi::to_wide is better than using wi::to_widest here.
What's the reason for preferring wi::to_wide? wi::to_widest should
usually be more efficient for this kind of check, since the tree
representation allows the underlying HWIs to be used directly.
wi::to_wide instead requires masking off bits above the precision.
E.g. on an --enable-checking=release compiler:
bool
foo (tree t, unsigned int n)
{
return wi::ltu_p (wi::to_widest (t), n);
}
gives:
188c: 79400c02 ldrh w2, [x0, #6]
1890: 7100045f cmp w2, #0x1
1894: 54000060 b.eq 18a0 <foo(tree_node*, unsigned
int)+0x14> // b.none
1898: 52800000 mov w0, #0x0 // #0
189c: d65f03c0 ret
18a0: f9400800 ldr x0, [x0, #16]
18a4: eb21401f cmp x0, w1, uxtw
18a8: 1a9f27e0 cset w0, cc // cc = lo, ul, last
18ac: d65f03c0 ret
whereas:
bool
foo (tree t, unsigned int n)
{
return wi::ltu_p (wi::to_wide (t), n);
}
gives:
188c: 79400802 ldrh w2, [x0, #4]
1890: 7100045f cmp w2, #0x1
1894: 54000060 b.eq 18a0 <foo(tree_node*, unsigned
int)+0x14> // b.none
1898: 52800000 mov w0, #0x0 // #0
189c: d65f03c0 ret
18a0: a9408800 ldp x0, x2, [x0, #8]
18a4: 79406c03 ldrh w3, [x0, #54]
18a8: 92800000 mov x0, #0xffffffffffffffff // #-1
18ac: 7100fc7f cmp w3, #0x3f
18b0: 9ac32000 lsl x0, x0, x3
18b4: 8a200040 bic x0, x2, x0
18b8: 9a829002 csel x2, x0, x2, ls // ls = plast
18bc: eb21405f cmp x2, w1, uxtw
18c0: 1a9f27e0 cset w0, cc // cc = lo, ul, last
18c4: d65f03c0 ret
> The other place in match which checks shift count does:
> /* Use a signed compare to leave negative shift counts alone. */
> && wi::ges_p (wi::to_wide (uniform_integer_cst_p (@1)),
> element_precision (type)))
>
>
>> + (with { auto shift = tree_to_uhwi (@1); }
>> + (if ((prec <= element_precision (TREE_TYPE (@2)) + shift
>> + || wi::to_widest (@2) >= 0)
>
> I think `wi::to_widest (@2) >= 0` can be written as
> `!tree_int_cst_sign_bit (@2)`.
Ah, yeah, thanks.
Richard