[Bug tree-optimization/111770] predicated loads inactive lane values not modelled

2024-02-22 Thread jsm28 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

--- Comment #6 from Joseph S. Myers  ---
X + 0. -> X is invalid for FP with signed zero or signaling NaN, and also gets
the wrong quantum exponent for decimal FP unless the zero has the maximum
possible quantum exponent (which is not what you get from all bits of the
representation zero, which is a zero with the minimum possible quantum
exponent, or from converting integer 0 to DFP, which has quantum exponent 0).

(If you add -0. (with maximum quantum exponent, in the DFP case) instead of
+0., that does produce X for X not a signaling NaN - except in FE_DOWNWARD
mode. Whereas adding +0. is only correct in FE_DOWNWARD mode, since 0. - 0. has
positive sign in all other modes.)

[Bug tree-optimization/111770] predicated loads inactive lane values not modelled

2024-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

--- Comment #5 from Richard Biener  ---
(In reply to Alex Coplan from comment #4)
> (In reply to Richard Biener from comment #3)
> > Now, if-conversion could indeed elide the .COND_ADD for integers.  It's
> > problematic there only because of signed overflow undefinedness, so
> > you shouldn't see it for 'unsigned' already, and adding zero is safe.
> 
> Can you elaborate on this a bit? Do you mean to say that the .COND_ADD is
> only there to avoid if-conversion introducing UB due to signed overflow?

No, you are right.

> ISTM it's needed for correctness even without that, as the addend needn't be
> guaranteed to be zero in the general case.
> 
> > if-conversion would need to have an idea of all the ranges involved here
> > so it might be a bit sophisticated to get it right.
> 
> Does what I suggested above make any sense, or do you have in mind a
> different way of handling this in if-conversion? I'm wondering how ifcvt
> should determine that the addend is zero in the case where the predicate is
> false.

ifcvt would need to compute said fact, say, keep a lattice of the value
(or value-range) that's there when the block isn't executed (simulating
a disabled vector lane).  A load that's going to be replaced by a
.MASK_LOAD can be then known zero and this needs to be propagated through
regular stmts (like the multiply).  There's also .COND_* which if-conversion
could actually provide the else value for - like if we have a following
division to avoid dividing by zero.  But that would be propagating backwards
(I'd still have this usecase in mind).

[Bug tree-optimization/111770] predicated loads inactive lane values not modelled

2024-02-22 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

--- Comment #4 from Alex Coplan  ---
(In reply to Richard Biener from comment #3)
> As said X + 0. -> X is an invalid transform with FP unless there are no
> signed zeros (maybe also problematic with sign-dependent rounding).

Yeah, I was thinking about the integer case above.

> 
> I think we agree to define .MASK_LOAD to zero masked elements.  When we need
> something else we need to add an explicit ELSE value.  That needs documenting
> of course and also possibly testsuite coverage - I _think_ you should be able
> to do a GIMPLE frontend testcase for this.

Sounds good, thanks.

> 
> Note this behavior would extend to .MASK_GATHER_LOAD as well as
> the load-lanes and -len variants.
> 
> Unfortunately we do not have _any_ internals manual documentation for
> internal functions - but you can backtrack to the optabs documentation
> where this would need documenting.
> 
> Now, if-conversion could indeed elide the .COND_ADD for integers.  It's
> problematic there only because of signed overflow undefinedness, so
> you shouldn't see it for 'unsigned' already, and adding zero is safe.

Can you elaborate on this a bit? Do you mean to say that the .COND_ADD is only
there to avoid if-conversion introducing UB due to signed overflow? ISTM it's
needed for correctness even without that, as the addend needn't be guaranteed
to be zero in the general case.

> if-conversion would need to have an idea of all the ranges involved here
> so it might be a bit sophisticated to get it right.

Does what I suggested above make any sense, or do you have in mind a different
way of handling this in if-conversion? I'm wondering how ifcvt should determine
that the addend is zero in the case where the predicate is false.

Thanks

[Bug tree-optimization/111770] predicated loads inactive lane values not modelled

2024-02-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

--- Comment #3 from Richard Biener  ---
As said X + 0. -> X is an invalid transform with FP unless there are no signed
zeros (maybe also problematic with sign-dependent rounding).

I think we agree to define .MASK_LOAD to zero masked elements.  When we need
something else we need to add an explicit ELSE value.  That needs documenting
of course and also possibly testsuite coverage - I _think_ you should be able
to do a GIMPLE frontend testcase for this.

Note this behavior would extend to .MASK_GATHER_LOAD as well as
the load-lanes and -len variants.

Unfortunately we do not have _any_ internals manual documentation for
internal functions - but you can backtrack to the optabs documentation
where this would need documenting.

Now, if-conversion could indeed elide the .COND_ADD for integers.  It's
problematic there only because of signed overflow undefinedness, so
you shouldn't see it for 'unsigned' already, and adding zero is safe.
if-conversion would need to have an idea of all the ranges involved here
so it might be a bit sophisticated to get it right.

[Bug tree-optimization/111770] predicated loads inactive lane values not modelled

2024-02-21 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

--- Comment #2 from Alex Coplan  ---
I think to progress this and related cases we need to have .MASK_LOAD defined
to zero in the case that the predicate is false (either unconditionally for all
targets if possible or otherwise conditionally for targets where that is safe).

Here is a related case:

int bar(int n, char *a, char *b, char *c) {
  int sum = 0;
  for (int i = 0; i < n; ++i)
if (c[i] == 0)
  sum += a[i] * b[i];
  return sum;
}

in this case we get the missed optimization even before vectorization during
ifcvt (in some ways it is a simpler case to consider as only scalars are
involved).  Here with -O3 -march=armv9-a from ifcvt we get:

   [local count: 955630224]:
  # sum_23 = PHI <_ifc__41(8), 0(18)>
  # i_25 = PHI 
  _1 = (sizetype) i_25;
  _2 = c_16(D) + _1;
  _3 = *_2;
  _29 = _3 == 0;
  _43 = _42 + _1;
  _4 = (char *) _43;
  _5 = .MASK_LOAD (_4, 8B, _29);
  _6 = (int) _5;
  _45 = _44 + _1;
  _7 = (char *) _45;
  _8 = .MASK_LOAD (_7, 8B, _29);
  _9 = (int) _8;
  _46 = (unsigned int) _6;
  _47 = (unsigned int) _9;
  _48 = _46 * _47;
  _10 = (int) _48;
  _ifc__41 = .COND_ADD (_29, sum_23, _10, sum_23);

for this case it should be possible to use an unpredicated add instead of a
.COND_ADD.  We essentially need to show that this transformation is valid:

  _29 ? sum_23 + _10 : sum_23 --> sum_23 + _10

and this essentially boils down to showing that:

  _29 = false => _10 = 0

now I'm not sure if there's a way of match-and-simplifying some GIMPLE
expression under the assumption that a given SSA name takes a particular value;
but if there were, and we defined .MASK_LOAD to zero given a false predicate,
then we could evaluate _10 under the assumption that _29 = false, which if we
added some simple match.pd rule for .MASK_LOAD with a false predicate would
allow it to evaluate to zero, and thus we could establish _10 = 0 proving the
transformation is correct.  If such an approach is possible then I guess ifcvt
could use it to avoid conditionalizing statements unnecessarily.

Richi: any thoughts on the above or on how we should handle this sort of thing?

[Bug tree-optimization/111770] predicated loads inactive lane values not modelled

2023-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-11
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
"Generally" we need an "else" value for .MASK_LOAD and there model "don't
care".
Elsewhere we observed that most uarchs do zero masked elements (or at least
offer so) and thus .MASK_LOAD without else value might be interpreted as
doing that (until I came along in the other related PR sugggesting an omitted
'else' value means 'don't care' - but IIRC the RISC-V folks ended up
implementing that with default-defs).

Btw, we should possibly vectorize this with a COND_DOT_PROD, since adding
zeros isn't correct for HONOR_SIGNED_ZEROS/HONOR_SIGN_DEPENDENT_ROUNDING.