On Fri, Sep 27, 2024 at 6:39 PM Artemiy Volkov <artemiy.vol...@synopsys.com> wrote: > > On 9/27/2024 1:24 PM, Richard Biener wrote: > > On Mon, 23 Sep 2024, Artemiy Volkov wrote: > > > >> Implement a match.pd transformation inverting the sign of X in > >> C1 - X cmp C2, where C1 and C2 are integer constants and X is > >> of a wrapping signed type, by observing that: > >> > >> (a) If cmp is == or !=, simply move X and C2 to opposite sides of > >> the comparison to arrive at X cmp C1 - C2. > >> > >> (b) If cmp is <: > >> - C1 - X < C2 means that C1 - X spans the values of -INF, > >> -INF + 1, ..., C2 - 1; > >> - Therefore, X is one of C1 - -INF, C1 - (-INF + 1), ..., > >> C1 - C2 + 1; > >> - Subtracting (C1 + 1), X - (C1 + 1) is one of - (-INF) - 1, > >> - (-INF) - 2, ..., -C2; > >> - Using the fact that - (-INF) - 1 is +INF, derive that > >> X - (C1 + 1) spans the values +INF, +INF - 1, ..., -C2; > >> - Thus, the original expression can be simplified to > >> X - (C1 + 1) > -C2 - 1. > >> > >> (c) Similarly, C1 - X <= C2 is equivalent to X - (C1 + 1) >= -C2 - 1. > >> > >> (d) The >= and > cases are negations of (b) and (c), respectively. > >> > >> (e) In all cases, the expression -C2 - 1 can be shortened to > >> bit_not (C2). > >> > >> This transformation allows to occasionally save load-immediate / > >> subtraction instructions, e.g. the following statement: > >> > >> 10 - (int)f() >= 20; > >> > >> now compiles to > >> > >> addi a0,a0,-11 > >> slti a0,a0,-20 > >> > >> instead of > >> > >> li a5,10 > >> sub a0,a5,a0 > >> slti t0,a0,20 > >> xori a0,t0,1 > >> > >> on 32-bit RISC-V when compiled with -fwrapv. > >> > >> Additional examples can be found in the newly added test file. This > >> patch has been bootstrapped and regtested on aarch64, x86_64, and i386, > >> and additionally regtested on riscv32. > >> > >> gcc/ChangeLog: > >> > >> PR tree-optimization/116024 > >> * match.pd: New transformation around integer comparison. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/tree-ssa/pr116024-1-fwrapv.c: New test. > >> > >> Signed-off-by: Artemiy Volkov <arte...@synopsys.com> > >> --- > >> gcc/match.pd | 20 ++++- > >> .../gcc.dg/tree-ssa/pr116024-1-fwrapv.c | 73 +++++++++++++++++++ > >> 2 files changed, 92 insertions(+), 1 deletion(-) > >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c > >> > >> diff --git a/gcc/match.pd b/gcc/match.pd > >> index d0489789527..bf3b4a2e3fe 100644 > >> --- a/gcc/match.pd > >> +++ b/gcc/match.pd > >> @@ -8970,7 +8970,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >> (cmp (plus @1 (minus @2 @0)) @2)) > >> (if (cmp == LT_EXPR || cmp == GE_EXPR) > >> (cmp (plus @1 (minus @2 > >> - (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2))))))) > >> + (plus @0 { build_one_cst (TREE_TYPE (@1)); }))) @2))) > >> +/* For wrapping signed types (-fwrapv), transform like so (using < as > >> example): > >> + C1 - X < C2 > >> + ==> C1 - X = { -INF, -INF + 1, ..., C2 - 1 } > >> + ==> X = { C1 - (-INF), C1 - (-INF + 1), ..., C1 - C2 + 1 } > >> + ==> X - (C1 + 1) = { - (-INF) - 1, - (-INF) - 2, ..., -C2 } > >> + ==> X - (C1 + 1) = { +INF, +INF - 1, ..., -C2 } > >> + ==> X - (C1 + 1) > -C2 - 1 > >> + ==> X - (C1 + 1) > bit_not (C2) > >> + > >> + Similarly, > >> + C1 - X <= C2 ==> X - (C1 + 1) >= bit_not (C2); > >> + C1 - X >= C2 ==> X - (C1 + 1) <= bit_not (C2); > >> + C1 - X > C2 ==> X - (C1 + 1) < bit_not (C2). */ > >> + (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@1))) > > > > You need to add && !TYPE_UNSIGNED (TREE_TYPE (@1)) here, > > TYPE_OVERFLOW_WRAPS is also true for unsigned types. > > This pattern is written as a series of nested ifs, and this if is > actually the else arm of the > if (TYPE_UNSIGNED (TREE_TYPE (@1)) > check on line 9078 (introduced in patch #2), so if we got here then > TYPE_UNSIGNED is already necessarily false (and thus this code already > only deals with signed wrapping types). I don't think the extra check > is necessary, but I can add it if it makes things clearer. What do you > think?
I think it's good for documenting to put in a redundant check. Large if-elseif-elseif chains can be difficult to understand. Note there's also (switch (if (...) ...) (if (...) ...) (if (...) ...)) which for a chain of cases is often better to use. Richard. > > Thanks, > Artemiy > > > > > OK with that change. > > > > Thanks, > > Richard. > > > >> + (if (cmp == EQ_EXPR || cmp == NE_EXPR) > >> + (cmp @1 (minus @0 @2)) > >> + (rcmp (minus @1 (plus @0 { build_one_cst (TREE_TYPE (@1)); })) > >> + (bit_not @2)))))))) > >> > >> /* Canonicalizations of BIT_FIELD_REFs. */ > >> > >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c > >> b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c > >> new file mode 100644 > >> index 00000000000..c2bf1d17234 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116024-1-fwrapv.c > >> @@ -0,0 +1,73 @@ > >> +/* PR tree-optimization/116024 */ > >> +/* { dg-do compile } */ > >> +/* { dg-options "-O1 -fdump-tree-forwprop1-details -fwrapv" } */ > >> + > >> +#include <stdint.h> > >> + > >> +uint32_t f(void); > >> + > >> +int32_t i2(void) > >> +{ > >> + int32_t l = 2; > >> + l = 10 - (int32_t)f(); > >> + return l <= 20; // f() - 11 >= -21 > >> +} > >> + > >> +int32_t i2a(void) > >> +{ > >> + int32_t l = 2; > >> + l = 10 - (int32_t)f(); > >> + return l < 30; // f() - 11 > -31 > >> +} > >> + > >> +int32_t i2b(void) > >> +{ > >> + int32_t l = 2; > >> + l = 200 - (int32_t)f(); > >> + return l <= 100; // f() - 201 >= -101 > >> +} > >> + > >> +int32_t i2c(void) > >> +{ > >> + int32_t l = 2; > >> + l = 300 - (int32_t)f(); > >> + return l < 100; // f() - 301 > -101 > >> +} > >> + > >> +int32_t i2d(void) > >> +{ > >> + int32_t l = 2; > >> + l = 1000 - (int32_t)f(); > >> + return l >= 2000; // f() - 1001 <= -2001 > >> +} > >> + > >> +int32_t i2e(void) > >> +{ > >> + int32_t l = 2; > >> + l = 1000 - (int32_t)f(); > >> + return l > 3000; // f() - 1001 < -3001 > >> +} > >> + > >> +int32_t i2f(void) > >> +{ > >> + int32_t l = 2; > >> + l = 20000 - (int32_t)f(); > >> + return l >= 10000; // f() - 20001 <= -10001 > >> +} > >> + > >> +int32_t i2g(void) > >> +{ > >> + int32_t l = 2; > >> + l = 30000 - (int32_t)f(); > >> + return l > 10000; // f() - 30001 < -10001 > >> +} > >> + > >> +/* { dg-final { scan-tree-dump-times "Removing dead stmt:.*?- _" 8 > >> "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -11.*\n.*>= -21" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -11.*\n.*>= -30" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -201.*\n.*>= -101" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -301.*\n.*>= -100" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -1001.*\n.*< -2000" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -1001.*\n.*< -3001" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -20001.*\n.*< -10000" 1 "forwprop1" } } */ > >> +/* { dg-final { scan-tree-dump-times "gimple_simplified to.* \\+ > >> -30001.*\n.*< -10001" 1 "forwprop1" } } */ > >> > > >