The mitigation for PR66313 (bogus re-association causing undefined overflow) causes missed optimizations because the fix was to perform possibly dangerous transforms in unsigned arithmetic. Unfortunately this loses knowledge of no-overflow which means SCEV analysis when facing the unsigned computations has to conclude IVs may wrap and thus are not affine when used in address context.
The following fix for this issue tries to delay dangerous transforms instead to a point where range info may allow them without resorting to unsigned arithmetic. So compared to the original fix we do less foldings (unless proved safe late). In the PR I also suggested to do a lowering during late GIMPLE to RTL semantics (-fwrapv) where all such transforms would become safe and also reassoc could handle signed arithmetic freely. This isn't something for GCC 8 though (and in its own naturally didn't help the PR), also because it requires some pass shuffling in the late pipeline. I'll revisit this for GCC 9. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2018-01-25 Richard Biener <rguent...@suse.de> PR tree-optimization/81082 * fold-const.c (fold_plusminus_mult_expr): Do not perform the association if it requires casting to unsigned. * match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived from fold_plusminus_mult_expr to catch important cases late when range info is available. * gcc.dg/vect/pr81082.c: New testcase. Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 257047) +++ gcc/fold-const.c (working copy) @@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc /* Same may be zero and thus the operation 'code' may overflow. Likewise same may be minus one and thus the multiplication may overflow. Perform - the operations in an unsigned type. */ + the sum operation in an unsigned type. */ tree utype = unsigned_type_for (type); tree tem = fold_build2_loc (loc, code, utype, fold_convert_loc (loc, utype, alt0), @@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc return fold_build2_loc (loc, MULT_EXPR, type, fold_convert (type, tem), same); - return fold_convert_loc (loc, type, - fold_build2_loc (loc, MULT_EXPR, utype, tem, - fold_convert_loc (loc, utype, same))); + /* Do not resort to unsigned multiplication because + we lose the no-overflow property of the expression. */ + return NULL_TREE; } /* Subroutine of native_encode_expr. Encode the INTEGER_CST Index: gcc/match.pd =================================================================== --- gcc/match.pd (revision 257047) +++ gcc/match.pd (working copy) @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (minus (convert (view_convert:stype @1)) (convert (view_convert:stype @2))))))) +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1). + Modeled after fold_plusminus_mult_expr. */ +(if (!TYPE_SATURATING (type) + && (!FLOAT_TYPE_P (type) || flag_associative_math)) + (for plusminus (plus minus) + (simplify + (plusminus (mult:s @0 @1) (mult:cs @0 @2)) + (if (!ANY_INTEGRAL_TYPE_P (type) + || TYPE_OVERFLOW_WRAPS (type) + || (INTEGRAL_TYPE_P (type) + && tree_expr_nonzero_p (@0) + && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))))) + (mult (plusminus @1 @2) @0))) + /* We cannot generate constant 1 for fract. */ + (if (!ALL_FRACT_MODE_P (TYPE_MODE (type))) + (simplify + (plusminus @0 (mult:cs @0 @2)) + (if (!ANY_INTEGRAL_TYPE_P (type) + || TYPE_OVERFLOW_WRAPS (type) + || (INTEGRAL_TYPE_P (type) + && tree_expr_nonzero_p (@0) + && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))))) + (mult (plusminus { build_one_cst (type); } @2) @0))) + (simplify + (plusminus (mult:cs @0 @2) @0) + (if (!ANY_INTEGRAL_TYPE_P (type) + || TYPE_OVERFLOW_WRAPS (type) + || (INTEGRAL_TYPE_P (type) + && tree_expr_nonzero_p (@0) + && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))))) + (mult (plusminus @2 { build_one_cst (type); }) @0)))))) /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax(). */ Index: gcc/testsuite/gcc.dg/vect/pr81082.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr81082.c (nonexistent) +++ gcc/testsuite/gcc.dg/vect/pr81082.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +int +f (int *x, int b1, int b2, int b3) +{ + int foo = 0; + for (int i1 = 0; i1 < b1; ++i1) + for (int i2 = 0; i2 < b2; ++i2) + for (int i3 = 0; i3 < b3; ++i3) + foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)]; + return foo; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */