[Bug tree-optimization/111770] predicated loads inactive lane values not modelled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 --- Comment #6 from Joseph S. Myers --- X + 0. -> X is invalid for FP with signed zero or signaling NaN, and also gets the wrong quantum exponent for decimal FP unless the zero has the maximum possible quantum exponent (which is not what you get from all bits of the representation zero, which is a zero with the minimum possible quantum exponent, or from converting integer 0 to DFP, which has quantum exponent 0). (If you add -0. (with maximum quantum exponent, in the DFP case) instead of +0., that does produce X for X not a signaling NaN - except in FE_DOWNWARD mode. Whereas adding +0. is only correct in FE_DOWNWARD mode, since 0. - 0. has positive sign in all other modes.)
[Bug tree-optimization/111770] predicated loads inactive lane values not modelled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 --- Comment #5 from Richard Biener --- (In reply to Alex Coplan from comment #4) > (In reply to Richard Biener from comment #3) > > Now, if-conversion could indeed elide the .COND_ADD for integers. It's > > problematic there only because of signed overflow undefinedness, so > > you shouldn't see it for 'unsigned' already, and adding zero is safe. > > Can you elaborate on this a bit? Do you mean to say that the .COND_ADD is > only there to avoid if-conversion introducing UB due to signed overflow? No, you are right. > ISTM it's needed for correctness even without that, as the addend needn't be > guaranteed to be zero in the general case. > > > if-conversion would need to have an idea of all the ranges involved here > > so it might be a bit sophisticated to get it right. > > Does what I suggested above make any sense, or do you have in mind a > different way of handling this in if-conversion? I'm wondering how ifcvt > should determine that the addend is zero in the case where the predicate is > false. ifcvt would need to compute said fact, say, keep a lattice of the value (or value-range) that's there when the block isn't executed (simulating a disabled vector lane). A load that's going to be replaced by a .MASK_LOAD can be then known zero and this needs to be propagated through regular stmts (like the multiply). There's also .COND_* which if-conversion could actually provide the else value for - like if we have a following division to avoid dividing by zero. But that would be propagating backwards (I'd still have this usecase in mind).
[Bug tree-optimization/111770] predicated loads inactive lane values not modelled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 --- Comment #4 from Alex Coplan --- (In reply to Richard Biener from comment #3) > As said X + 0. -> X is an invalid transform with FP unless there are no > signed zeros (maybe also problematic with sign-dependent rounding). Yeah, I was thinking about the integer case above. > > I think we agree to define .MASK_LOAD to zero masked elements. When we need > something else we need to add an explicit ELSE value. That needs documenting > of course and also possibly testsuite coverage - I _think_ you should be able > to do a GIMPLE frontend testcase for this. Sounds good, thanks. > > Note this behavior would extend to .MASK_GATHER_LOAD as well as > the load-lanes and -len variants. > > Unfortunately we do not have _any_ internals manual documentation for > internal functions - but you can backtrack to the optabs documentation > where this would need documenting. > > Now, if-conversion could indeed elide the .COND_ADD for integers. It's > problematic there only because of signed overflow undefinedness, so > you shouldn't see it for 'unsigned' already, and adding zero is safe. Can you elaborate on this a bit? Do you mean to say that the .COND_ADD is only there to avoid if-conversion introducing UB due to signed overflow? ISTM it's needed for correctness even without that, as the addend needn't be guaranteed to be zero in the general case. > if-conversion would need to have an idea of all the ranges involved here > so it might be a bit sophisticated to get it right. Does what I suggested above make any sense, or do you have in mind a different way of handling this in if-conversion? I'm wondering how ifcvt should determine that the addend is zero in the case where the predicate is false. Thanks
[Bug tree-optimization/111770] predicated loads inactive lane values not modelled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 --- Comment #3 from Richard Biener --- As said X + 0. -> X is an invalid transform with FP unless there are no signed zeros (maybe also problematic with sign-dependent rounding). I think we agree to define .MASK_LOAD to zero masked elements. When we need something else we need to add an explicit ELSE value. That needs documenting of course and also possibly testsuite coverage - I _think_ you should be able to do a GIMPLE frontend testcase for this. Note this behavior would extend to .MASK_GATHER_LOAD as well as the load-lanes and -len variants. Unfortunately we do not have _any_ internals manual documentation for internal functions - but you can backtrack to the optabs documentation where this would need documenting. Now, if-conversion could indeed elide the .COND_ADD for integers. It's problematic there only because of signed overflow undefinedness, so you shouldn't see it for 'unsigned' already, and adding zero is safe. if-conversion would need to have an idea of all the ranges involved here so it might be a bit sophisticated to get it right.
[Bug tree-optimization/111770] predicated loads inactive lane values not modelled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 --- Comment #2 from Alex Coplan --- I think to progress this and related cases we need to have .MASK_LOAD defined to zero in the case that the predicate is false (either unconditionally for all targets if possible or otherwise conditionally for targets where that is safe). Here is a related case: int bar(int n, char *a, char *b, char *c) { int sum = 0; for (int i = 0; i < n; ++i) if (c[i] == 0) sum += a[i] * b[i]; return sum; } in this case we get the missed optimization even before vectorization during ifcvt (in some ways it is a simpler case to consider as only scalars are involved). Here with -O3 -march=armv9-a from ifcvt we get: [local count: 955630224]: # sum_23 = PHI <_ifc__41(8), 0(18)> # i_25 = PHI _1 = (sizetype) i_25; _2 = c_16(D) + _1; _3 = *_2; _29 = _3 == 0; _43 = _42 + _1; _4 = (char *) _43; _5 = .MASK_LOAD (_4, 8B, _29); _6 = (int) _5; _45 = _44 + _1; _7 = (char *) _45; _8 = .MASK_LOAD (_7, 8B, _29); _9 = (int) _8; _46 = (unsigned int) _6; _47 = (unsigned int) _9; _48 = _46 * _47; _10 = (int) _48; _ifc__41 = .COND_ADD (_29, sum_23, _10, sum_23); for this case it should be possible to use an unpredicated add instead of a .COND_ADD. We essentially need to show that this transformation is valid: _29 ? sum_23 + _10 : sum_23 --> sum_23 + _10 and this essentially boils down to showing that: _29 = false => _10 = 0 now I'm not sure if there's a way of match-and-simplifying some GIMPLE expression under the assumption that a given SSA name takes a particular value; but if there were, and we defined .MASK_LOAD to zero given a false predicate, then we could evaluate _10 under the assumption that _29 = false, which if we added some simple match.pd rule for .MASK_LOAD with a false predicate would allow it to evaluate to zero, and thus we could establish _10 = 0 proving the transformation is correct. If such an approach is possible then I guess ifcvt could use it to avoid conditionalizing statements unnecessarily. Richi: any thoughts on the above or on how we should handle this sort of thing?
[Bug tree-optimization/111770] predicated loads inactive lane values not modelled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770 Richard Biener changed: What|Removed |Added Last reconfirmed||2023-10-11 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- "Generally" we need an "else" value for .MASK_LOAD and there model "don't care". Elsewhere we observed that most uarchs do zero masked elements (or at least offer so) and thus .MASK_LOAD without else value might be interpreted as doing that (until I came along in the other related PR sugggesting an omitted 'else' value means 'don't care' - but IIRC the RISC-V folks ended up implementing that with default-defs). Btw, we should possibly vectorize this with a COND_DOT_PROD, since adding zeros isn't correct for HONOR_SIGNED_ZEROS/HONOR_SIGN_DEPENDENT_ROUNDING.