https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81082
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So is there anything we can do here? Isn't the bigger problem that we no longer notice the multiplications can't overflow? With the int multiplications gone, we need to assume [1, INT_MAX] ranges on b1, b2 and b3 in the loop (otherwise it wouldn't loop at all), and [0, INT_MAX - 1] ranges for i1, i2 and i3. But, from the i1 * b2 * b3 + i2 * b3 + (i3 - 1) we could have determined that b1 * b2 * b3 is [0, INT_MAX]. We do vectorize: int f (int *x, int b1, int b2, int b3) { int foo = 0; for (int i1 = 0; i1 < b1; ++i1) for (int i2 = 0; i2 < b2; ++i2) for (int i3 = 0; i3 < b3; ++i3) foo += x[(i1 * b2 + i2) * b3 + (i3 - 1)]; return foo; } without gather loads, while the #c0 only with them (if available). Regarding the PR66313 optimization, first of all, what happened to the move of it to match.pd? As we vectorize: int f (int *x, int b1, int b2, int b3) { int foo = 0; for (int i1 = 0; i1 < b1; ++i1) for (int i2 = 0; i2 < b2; ++i2) for (int i3 = 0; i3 < b3; ++i3) { int t1 = i1 * b2 * b3; int t2 = i2 * b3; int t3 = i3 - 1; foo += x[t1 + t2 + t3]; } return foo; } it seems that hasn't been done. Second thing, in fold_plusminus_mult_expr I don't understand the: /* We are neither factoring zero nor minus one. */ || TREE_CODE (same) == INTEGER_CST) part, shouldn't that be || (TREE_CODE (same) == INTEGER_CST && !integer_zerop (same) && !integer_minus_onep (same)) ? Another thing is, for signed a, b, c we can't optimize a * b + a * c as a * (b + c) if a can be -1 or 0, exhaustive proof for signed char: int main () { int a, b, c; for (a = -128; a < 128; a++) for (b = -128; b < 128; b++) for (c = -128; c < 128; c++) { int ab = a * b; int ac = a * c; int ab_ac = ab + ac; if (ab < -128 || ab >= 128 || ac < -128 || ac >= 128 || ab_ac < -128 || ab_ac >= 128) continue; /* Ok, in this case a * b + a * c is without UB. */ int b_c = b + c; #ifdef NEW b_c = (signed char) b_c; #endif int r = a * b_c; if (b_c < -128 || b_c >= 128 || r != ab_ac) __builtin_printf ("%d * %d + %d * %d = %d wrong opt (%d)\n", a, b, a, c, ab_ac, r); } return 0; } But, if we know for sure that a is not -1, we could optimize it as a * (signed)((unsigned) b + c) (which doesn't work for a == -1, but works for others) instead of (signed)(a * ((unsigned) b + c)) (which works for all, but is a too big hammer). Perhaps if this optimization is moved over to match.pd and we'd defer it if a isn't constant until at least vrp1 and tune based on range info? The range for b3 at least during vrp pass itself is [1, INT_MAX], so neither -1 nor 0 are possible and we could safely reassociate it as a * (b + c) rather than the variants with unsigned. Or do we want to somehow mark the MULT_EXPR we've created, propagate it through the IL and if we in vrp notice this way marked MULT_EXPR (or IFN) we fold it as integer multiplication? That would be ugly.